i have text file contains data. data follows join2_train = sc.textfile('join2_train.csv',4) join2_train.take(3) [u'21.9059,ta-00002,s-0066,7/7/2013,0,0,yes,1,sp-0019,6.35,0.71,137,8,19.05,n,n,n,n,ef-008,ef-008,0,0,0', u'12.3412,ta-00002,s-0066,7/7/2013,0,0,yes,2,sp-0019,6.35,0.71,137,8,19.05,n,n,n,n,ef-008,ef-008,0,0,0', u'6.60183,ta-00002,s-0066,7/7/2013,0,0,yes,5,sp-0019,6.35,0.71,137,8,19.05,n,n,n,n,ef-008,ef-008,0,0,0'] now trying parse string function splits each of lines of text , convert labeledpoint. have included line converting string elements float the function follows from pyspark.mllib.regression import labeledpoint import numpy np def parsepoint(line): """converts comma separated unicode string `labeledpoint`. args: line (unicode): comma separated unicode string first element label , remaining elements features. returns: labeledpoint: line converted `labe...
Comments
Post a Comment