Print 'std err' value from statsmodels OLS results
Question:
(Sorry to ask but http://statsmodels.sourceforge.net/ is currently down and I can’t access the docs)
I’m doing a linear regression using statsmodels
, basically:
import statsmodels.api as sm
model = sm.OLS(y,x)
results = model.fit()
I know that I can print out the full set of results with:
print results.summary()
which outputs something like:
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.952
Model: OLS Adj. R-squared: 0.951
Method: Least Squares F-statistic: 972.9
Date: Mon, 20 Jul 2015 Prob (F-statistic): 5.55e-34
Time: 15:35:22 Log-Likelihood: -78.843
No. Observations: 50 AIC: 159.7
Df Residuals: 49 BIC: 161.6
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1 1.0250 0.033 31.191 0.000 0.959 1.091
==============================================================================
Omnibus: 16.396 Durbin-Watson: 2.166
Prob(Omnibus): 0.000 Jarque-Bera (JB): 3.480
Skew: -0.082 Prob(JB): 0.175
Kurtosis: 1.718 Cond. No. 1.00
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
I need a way to print out only the values of coef
and std err
.
I can access coef
with:
print results.params
but I’ve found no way to print out std err
.
How can I do this?
Answers:
The following function can be used to get an overview of the regression analysis result. The parameter ols_model
is the regression model generated by statsmodels.formula.api
. The output is a pandas data frame saving the regression coefficient, standard errors, p values, number of observations, AIC, and adjusted rsquared. The standard errors are saved in brackets. ***
, **
, and *
represent 0.001, 0.01, 0.1 significance level:
def output_regres_result(ols_model, variable_list: list):
"""
Create a pandas dataframe saving the regression analysis result
:param ols_model: a linear model containing the regression result.
type: statsmodels.regression.linear_model.RegressionResultsWrapper
:param variable_list: a list of interested variable names
:return: a pandas dataframe saving the regression coefficient, pvalues, standard errors, aic,
number of observations, adjusted r squared
"""
coef_dict = ols_model.params.to_dict() # coefficient dictionary
pval_dict = ols_model.pvalues.to_dict() # pvalues dictionary
std_error_dict = ols_model.bse.to_dict() # standard error dictionary
num_observs = np.int(ols_model.nobs) # number of observations
aic_val = round(ols_model.aic, 2) # aic value
adj_rsqured = round(ols_model.rsquared_adj, 3) # adjusted rsqured
info_index = ['Num', 'AIC', 'Adjusted R2']
index_list = variable_list + info_index
for variable in variable_list:
assert variable in coef_dict, 'Something wrong with variable name!'
coef_vals = []
for variable in variable_list:
std_val = std_error_dict[variable]
coef_val = coef_dict[variable]
p_val = pval_dict[variable]
if p_val <= 0.01:
coef_vals.append('{}***({})'.format(round(coef_val, 4), round(std_val, 3)))
elif 0.01 < p_val <= 0.05:
coef_vals.append('{}**({})'.format(round(coef_val, 4), round(std_val, 3)))
elif 0.05 < p_val <= 0.1:
coef_vals.append('{}*({})'.format(round(coef_val, 4), round(std_val, 3)))
else:
coef_vals.append('{}({})'.format(round(coef_val, 4), round(std_val, 3)))
coef_vals.extend([num_observs, aic_val, adj_rsqured])
result_data = pd.DataFrame()
result_data['coef'] = coef_vals
result_data_reindex = result_data.set_index(pd.Index(index_list))
return result_data_reindex
statistically standard error of estimate is always equal to square root of mean square error of residual. It can be obtained from results using the formula np.sqrt(results.mse_resid)
results.bse
provides standard errors for the coefficients, identical to those listed in results.summary()
.
The standard error of the regression is obtained using results.scale**.5
.
Also identical to np.sqrt(np.sum(results.resid**2)/results.df_resid)
, where results is your fitted model.
I like Topchi’s method but an identical result can be pulled with slightly less code. This is for residual standard error, rather than standard errors of parameter estimates which others have already shared in the thread 🙂
np.sqrt(results.scale)
(Sorry to ask but http://statsmodels.sourceforge.net/ is currently down and I can’t access the docs)
I’m doing a linear regression using statsmodels
, basically:
import statsmodels.api as sm
model = sm.OLS(y,x)
results = model.fit()
I know that I can print out the full set of results with:
print results.summary()
which outputs something like:
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.952
Model: OLS Adj. R-squared: 0.951
Method: Least Squares F-statistic: 972.9
Date: Mon, 20 Jul 2015 Prob (F-statistic): 5.55e-34
Time: 15:35:22 Log-Likelihood: -78.843
No. Observations: 50 AIC: 159.7
Df Residuals: 49 BIC: 161.6
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1 1.0250 0.033 31.191 0.000 0.959 1.091
==============================================================================
Omnibus: 16.396 Durbin-Watson: 2.166
Prob(Omnibus): 0.000 Jarque-Bera (JB): 3.480
Skew: -0.082 Prob(JB): 0.175
Kurtosis: 1.718 Cond. No. 1.00
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
I need a way to print out only the values of coef
and std err
.
I can access coef
with:
print results.params
but I’ve found no way to print out std err
.
How can I do this?
The following function can be used to get an overview of the regression analysis result. The parameter ols_model
is the regression model generated by statsmodels.formula.api
. The output is a pandas data frame saving the regression coefficient, standard errors, p values, number of observations, AIC, and adjusted rsquared. The standard errors are saved in brackets. ***
, **
, and *
represent 0.001, 0.01, 0.1 significance level:
def output_regres_result(ols_model, variable_list: list):
"""
Create a pandas dataframe saving the regression analysis result
:param ols_model: a linear model containing the regression result.
type: statsmodels.regression.linear_model.RegressionResultsWrapper
:param variable_list: a list of interested variable names
:return: a pandas dataframe saving the regression coefficient, pvalues, standard errors, aic,
number of observations, adjusted r squared
"""
coef_dict = ols_model.params.to_dict() # coefficient dictionary
pval_dict = ols_model.pvalues.to_dict() # pvalues dictionary
std_error_dict = ols_model.bse.to_dict() # standard error dictionary
num_observs = np.int(ols_model.nobs) # number of observations
aic_val = round(ols_model.aic, 2) # aic value
adj_rsqured = round(ols_model.rsquared_adj, 3) # adjusted rsqured
info_index = ['Num', 'AIC', 'Adjusted R2']
index_list = variable_list + info_index
for variable in variable_list:
assert variable in coef_dict, 'Something wrong with variable name!'
coef_vals = []
for variable in variable_list:
std_val = std_error_dict[variable]
coef_val = coef_dict[variable]
p_val = pval_dict[variable]
if p_val <= 0.01:
coef_vals.append('{}***({})'.format(round(coef_val, 4), round(std_val, 3)))
elif 0.01 < p_val <= 0.05:
coef_vals.append('{}**({})'.format(round(coef_val, 4), round(std_val, 3)))
elif 0.05 < p_val <= 0.1:
coef_vals.append('{}*({})'.format(round(coef_val, 4), round(std_val, 3)))
else:
coef_vals.append('{}({})'.format(round(coef_val, 4), round(std_val, 3)))
coef_vals.extend([num_observs, aic_val, adj_rsqured])
result_data = pd.DataFrame()
result_data['coef'] = coef_vals
result_data_reindex = result_data.set_index(pd.Index(index_list))
return result_data_reindex
statistically standard error of estimate is always equal to square root of mean square error of residual. It can be obtained from results using the formula np.sqrt(results.mse_resid)
results.bse
provides standard errors for the coefficients, identical to those listed in results.summary()
.
The standard error of the regression is obtained using results.scale**.5
.
Also identical to np.sqrt(np.sum(results.resid**2)/results.df_resid)
, where results is your fitted model.
I like Topchi’s method but an identical result can be pulled with slightly less code. This is for residual standard error, rather than standard errors of parameter estimates which others have already shared in the thread 🙂
np.sqrt(results.scale)