python - sklearn issue: Found arrays with inconsistent numbers of samples when doing regression -
this question seems have been asked before, can't seem comment further clarification on accepted answer , couldn't figure out solution provided.
i trying learn how use sklearn own data. got annual % change in gdp 2 different countries on past 100 years. trying learn using single variable now. trying use sklearn predict gdp % change country given percentage change in country b's gdp.
the problem receive error saying:
valueerror: found arrays inconsistent numbers of samples: [ 1 107]
here code:
import sklearn.linear_model lm import numpy np import scipy.stats st import matplotlib.pyplot plt import matplotlib.dates mdates def bytespdate2num(fmt, encoding='utf-8'):#function convert bytes string dates. strconverter = mdates.strpdate2num(fmt) def bytesconverter(b): s = b.decode(encoding) return strconverter(s) return bytesconverter datacsv = open('combined_data.csv') comb_data = [] line in datacsv: comb_data.append(line) date, chngdpchange, ausgdpchange = np.loadtxt(comb_data, delimiter=',', unpack=true, converters={0: bytespdate2num('%d/%m/%y')}) chntrain = chngdpchange[:-1] chntest = chngdpchange[-1:] austrain = ausgdpchange[:-1] austest = ausgdpchange[-1:] regr = lm.linearregression() regr.fit(chntrain, austrain) print('coefficients: \n', regr.coef_) print("residual sum of squares: %.2f" % np.mean((regr.predict(chntest) - austest) ** 2)) print('variance score: %.2f' % regr.score(chntest, austest)) plt.scatter(chntest, austest, color='black') plt.plot(chntest, regr.predict(chntest), color='blue') plt.xticks(()) plt.yticks(()) plt.show()
what doing wrong? tried apply sklearn tutorial (they used diabetes data set) own simple data. data contains date, country a's % change in gdp specific year, , country b's % change in gdp same year.
i tried solutions here , here (basically trying find more out solution in first link), receive exact same error.
here full traceback in case want see it:
traceback (most recent call last): file "d:\my stuff\dropbox\python\python projects\test regression\tester.py", line 34, in <module> regr.fit(chntrain, austrain) file "d:\programs\installed\python34\lib\site-packages\sklearn\linear_model\base.py", line 376, in fit y_numeric=true, multi_output=true) file "d:\programs\installed\python34\lib\site-packages\sklearn\utils\validation.py", line 454, in check_x_y check_consistent_length(x, y) file "d:\programs\installed\python34\lib\site-packages\sklearn\utils\validation.py", line 174, in check_consistent_length "%s" % str(uniques)) valueerror: found arrays inconsistent numbers of samples: [ 1 107]
in fit(x,y),the input parameter x supposed 2-d array. if x in data one-dimension, can reshape 2-d array this:regr.fit(chntrain_x.reshape(len(chntrain_x), 1), chntrain_y)
Comments
Post a Comment