Custom Standard Errors with Statsmodels / Stargazer

Question:

I am running a series of OLS regressions in Python, where I am using standard errors that I calculate using a custom function.

I am now exporting my regression results into tables, and plan on using the stargazer package (linked here). However, stargazer relies on regression results being calculated via the statsmodels package.

I am having trouble incorporating my custom standard errors into statsmodels, and hence cannot export using stargazer. I have tried looking if there is a way to overwrite default standard errors in statsmodels, but have not been successful.

I’ve provided example below:

import pandas as pd
from sklearn import datasets
import statsmodels.api as sm
from stargazer.stargazer import Stargazer

#load data
diabetes = datasets.load_diabetes()
df = pd.DataFrame(diabetes.data)
df.columns = ['Age', 'Sex', 'BMI', 'ABP', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6']
df['target'] = diabetes.target

#run regressions with statsmodels
est = sm.OLS(endog=df['target'], exog=sm.add_constant(df[df.columns[0:4]])).fit()

#custom standard errors function, returns a K-by-1 vector where K is the number of predictors
#I return a vector of ones here for simplicity
def custom_standard_errors(endog, exog):
    return [1 for i in range(len(exog.columns))]
    
#export regression table with stargazer
stargazer = Stargazer([est])

The stargazer object is displayed below. My goal is to overwrite the standard errors in the parentheses with output from custom_standard_errors(). As such, every value in parentheses should be 1, in this example.

Stargazer output

Asked By: lithium123

||

Answers:

This required some digging, but I believe I have a working solution.

When you create an instance of the Stargazer class such as your object stargazer, most of the regression results are extracted from the est object which is of type ResultsWrapper (from statsmodels).

Three instance methods called extract_data, _extract_feature, and extract_model_data are called and extract_model_data does a lot of the heavy lifting: it specifically extracts features stored in statsmodels_map, which looks like the following:

statsmodels_map = {'p_values' : 'pvalues', ## ⭠ and this too
                   'cov_values' : 'params',
                   'cov_std_err' : 'bse', ## ⭠ we want to modify this 
                   'r2' : 'rsquared',
                   'r2_adj' : 'rsquared_adj',
                   'f_p_value' : 'f_pvalue',
                   'degree_freedom' : 'df_model',
                   'degree_freedom_resid' : 'df_resid',
                   'nobs' : 'nobs',
                   'f_statistic' : 'fvalue'
                   } 

What we can do is create a child class called SuperStargazer that inherits all of the instance methods from Stargazer, and then override the extract_model_data to set cov_std_err using your custom_standard_errors function. Although it’s not strictly necessary, we’ll set custom_standard_errors as an instance attribute of the SuperStargazer class, as this allows you to use different custom functions if you define different instances of the SuperStargazer class.

Update: we’ll also update the p-values as they are related to the new custom standard errors. This involves recalculating the t-values, and then applying the formula: p_value = 2*(1 - t.cdf(abs(t_value), dof))

from statsmodels.base.wrapper import ResultsWrapper
from statsmodels.regression.linear_model import RegressionResults
from scipy.stats import t
from math import sqrt
from collections import defaultdict
from enum import Enum
import numbers
import pandas as pd

class SuperStargazer(Stargazer):
    def __init__(self, models, custom_standard_errors, **kwargs):
        self.custom_standard_errors = custom_standard_errors
        super().__init__(models=models, **kwargs)
    
    def extract_model_data(self, model):
        # For features that are simple attributes of "model", establish the
        # mapping with internal name (TODO: adopt same names?):
        statsmodels_map = {# 'p_values' : 'pvalues',
                           'cov_values' : 'params',
                           # 'cov_std_err' : 'bse',
                           'r2' : 'rsquared',
                           'r2_adj' : 'rsquared_adj',
                           'f_p_value' : 'f_pvalue',
                           'degree_freedom' : 'df_model',
                           'degree_freedom_resid' : 'df_resid',
                           'nobs' : 'nobs',
                           'f_statistic' : 'fvalue'
                           }

        data = {}
        for key, val in statsmodels_map.items():
            data[key] = self._extract_feature(model, val)

        if isinstance(model, ResultsWrapper):
            data['cov_names'] = model.params.index.values
            endog, exog = model.model.data.orig_endog, model.model.data.orig_exog
            custom_std_err_data = self.custom_standard_errors(endog, exog)
            data['cov_std_err'] = pd.Series(
                index=exog.columns,
                data=custom_std_err_data
            )
            data['t_values'] = data['cov_values'] / data['cov_std_err']
            dof = len(endog) - 2
            data['p_values'] = pd.Series(
                index=data['t_values'].index,
                data=2 * (1 - t.cdf(abs(data['t_values']), dof))
            )
            
        else:
            # Simple RegressionResults, for instance as a result of
            # get_robustcov_results():
            data['cov_names'] = model.model.data.orig_exog.columns

            # These are simple arrays, not Series:
            for what in 'cov_values', 'cov_std_err':
                data[what] = pd.Series(data[what],
                                       index=data['cov_names'])

        data['conf_int_low_values'] = model.conf_int()[0]
        data['conf_int_high_values'] = model.conf_int()[1]
        data['resid_std_err'] = (sqrt(sum(model.resid**2) / model.df_resid)
                                 if hasattr(model, 'resid') else None)

        # Workaround for
        # https://github.com/statsmodels/statsmodels/issues/6778:
        if 'f_statistic' in data:
            data['f_statistic'] = (lambda x : x[0, 0] if getattr(x, 'ndim', 0)
                                   else x)(data['f_statistic'])

        return data

Then when we can create an instance of SuperStargazer, it will render the following table (in JupyterLab in my case, but you can also call stargazer.render_html() to store the html string for later use)

stargazer = SuperStargazer(
    models=[est], 
    custom_standard_errors=custom_standard_errors
)

enter image description here

Answered By: Derek O