python - Appending predicted values and residuals to pandas dataframe -


it's useful , common practice append predicted values , residuals running regression onto dataframe distinct columns. i'm new pandas, , i'm having trouble performing simple operation. know i'm missing obvious. there a similar question asked year-and-a-half ago, wasn't answered.

the dataframe looks this:

y               x1           x2    880.37          3.17         23 716.20          4.76         26 974.79          4.17         73 322.80          8.70         72 1054.25         11.45        16 

and i'm wanting return dataframe has predicted value , residual y = x1 + x2 each observation:

y               x1           x2       y_hat         res 880.37          3.17         23       840.27        40.10 716.20          4.76         26       752.60        -36.40 974.79          4.17         73       877.49        97.30 322.80          8.70         72       348.50        -25.70 1054.25         11.45        16       815.15        239.10 

i've tried resolving using statsmodels , pandas , haven't been able solve it. in advance!

here variation on alexander's answer using ols model statsmodels instead of pandas ols model. can use either formula or array/dataframe interface models.

fittedvalues , resid pandas series correct index. predict not return pandas series.

import numpy np import pandas pd import statsmodels.api sm import statsmodels.formula.api smf  df = pd.dataframe({'x1': [3.17, 4.76, 4.17, 8.70, 11.45],                    'x2': [23, 26, 73, 72, 16],                    'y': [880.37, 716.20, 974.79, 322.80, 1054.25]},                    index=np.arange(10, 20, 2))  result = smf.ols('y ~ x1 + x2', df).fit() df['yhat'] = result.fittedvalues df['resid'] = result.resid   result2 = sm.ols(df['y'], sm.add_constant(df[['x1', 'x2']])).fit() df['yhat2'] = result2.fittedvalues df['resid2'] = result2.resid  # predict doesn't return pandas series , no index available df['predicted'] = result.predict(df)  print(df)         x1  x2        y        yhat       resid       yhat2      resid2  \ 10   3.17  23   880.37  923.949309  -43.579309  923.949309  -43.579309    12   4.76  26   716.20  890.732201 -174.532201  890.732201 -174.532201    14   4.17  73   974.79  656.155079  318.634921  656.155079  318.634921    16   8.70  72   322.80  610.510952 -287.710952  610.510952 -287.710952    18  11.45  16  1054.25  867.062458  187.187542  867.062458  187.187542          predicted   10  923.949309   12  890.732201   14  656.155079   16  610.510952   18  867.062458   

as preview, there extended prediction method in model results in statsmodels master (0.7), api not yet settled:

>>> print(result.get_prediction().summary_frame())           mean     mean_se  mean_ci_lower  mean_ci_upper  obs_ci_lower  \ 10  923.949309  268.931939    -233.171432    2081.070051   -991.466820    12  890.732201  211.945165     -21.194241    1802.658643   -887.328646    14  656.155079  269.136102    -501.844105    1814.154263  -1259.791854    16  610.510952  282.182030    -603.620329    1824.642233  -1339.874985    18  867.062458  329.017262    -548.584564    2282.709481  -1214.750941         obs_ci_upper   10   2839.365439   12   2668.793048   14   2572.102012   16   2560.896890   18   2948.875858   

Comments

Popular posts from this blog

html - Firefox flex bug applied to buttons? -

html - Missing border-right in select on Firefox -

c# - two queries in same method -