python - Appending predicted values and residuals to pandas dataframe -
it's useful , common practice append predicted values , residuals running regression onto dataframe distinct columns. i'm new pandas, , i'm having trouble performing simple operation. know i'm missing obvious. there a similar question asked year-and-a-half ago, wasn't answered.
the dataframe looks this:
y x1 x2 880.37 3.17 23 716.20 4.76 26 974.79 4.17 73 322.80 8.70 72 1054.25 11.45 16
and i'm wanting return dataframe has predicted value , residual y = x1 + x2 each observation:
y x1 x2 y_hat res 880.37 3.17 23 840.27 40.10 716.20 4.76 26 752.60 -36.40 974.79 4.17 73 877.49 97.30 322.80 8.70 72 348.50 -25.70 1054.25 11.45 16 815.15 239.10
i've tried resolving using statsmodels , pandas , haven't been able solve it. in advance!
here variation on alexander's answer using ols model statsmodels instead of pandas ols model. can use either formula or array/dataframe interface models.
fittedvalues
, resid
pandas series correct index. predict
not return pandas series.
import numpy np import pandas pd import statsmodels.api sm import statsmodels.formula.api smf df = pd.dataframe({'x1': [3.17, 4.76, 4.17, 8.70, 11.45], 'x2': [23, 26, 73, 72, 16], 'y': [880.37, 716.20, 974.79, 322.80, 1054.25]}, index=np.arange(10, 20, 2)) result = smf.ols('y ~ x1 + x2', df).fit() df['yhat'] = result.fittedvalues df['resid'] = result.resid result2 = sm.ols(df['y'], sm.add_constant(df[['x1', 'x2']])).fit() df['yhat2'] = result2.fittedvalues df['resid2'] = result2.resid # predict doesn't return pandas series , no index available df['predicted'] = result.predict(df) print(df) x1 x2 y yhat resid yhat2 resid2 \ 10 3.17 23 880.37 923.949309 -43.579309 923.949309 -43.579309 12 4.76 26 716.20 890.732201 -174.532201 890.732201 -174.532201 14 4.17 73 974.79 656.155079 318.634921 656.155079 318.634921 16 8.70 72 322.80 610.510952 -287.710952 610.510952 -287.710952 18 11.45 16 1054.25 867.062458 187.187542 867.062458 187.187542 predicted 10 923.949309 12 890.732201 14 656.155079 16 610.510952 18 867.062458
as preview, there extended prediction method in model results in statsmodels master (0.7), api not yet settled:
>>> print(result.get_prediction().summary_frame()) mean mean_se mean_ci_lower mean_ci_upper obs_ci_lower \ 10 923.949309 268.931939 -233.171432 2081.070051 -991.466820 12 890.732201 211.945165 -21.194241 1802.658643 -887.328646 14 656.155079 269.136102 -501.844105 1814.154263 -1259.791854 16 610.510952 282.182030 -603.620329 1824.642233 -1339.874985 18 867.062458 329.017262 -548.584564 2282.709481 -1214.750941 obs_ci_upper 10 2839.365439 12 2668.793048 14 2572.102012 16 2560.896890 18 2948.875858
Comments
Post a Comment