Co variance matrix of a model in python

Question:

To find the co variance matrix of a fitted model in python (equivalent to vcov() (R fucntion) in python)

lmfit <- lm(formula = Y ~ X, data=Data_df)
lmpred <- predict(lmfit, newdata=Data_df, se.fit=TRUE, interval = "prediction")
std_er <- sqrt(((X0) %*% vcov(lmfit)) %*% t(X0))

trying to convert the above code in python. For which i need to find the co variance matrix of the fitted model ie, vcov.
I wont be able to use np.cov() as im trying to find the co variance matrix of the model.

i have already used statsmodels.regression.linear_model.OLSResults.cov_params(), But i m not getting the same values as in R.

Asked By: Anu Abraham

||

Answers:

Please provide specific details on what you’re using.

Assuming you’re using numpy arrays for your data, there’s numpy.cov estimator

Answered By: Slam

The scipy ODR code can independently calculate the parameter covariance matrix, here is an example extracted from the source code of my zunzun.com online curve fitter:

from scipy.optimize import curve_fit
import numpy as np
import scipy.odr
import scipy.stats

x = np.array([5.357, 5.797, 5.936, 6.161, 6.697, 6.731, 6.775, 8.442, 9.861])
y = np.array([0.376, 0.874, 1.049, 1.327, 2.054, 2.077, 2.138, 4.744, 7.104])

def f(x,b0,b1):
    return b0 + (b1 * x)


def f_wrapper_for_odr(beta, x): # parameter order for odr
    return f(x, *beta)

parameters, cov= curve_fit(f, x, y)

model = scipy.odr.odrpack.Model(f_wrapper_for_odr)
data = scipy.odr.odrpack.Data(x,y)
myodr = scipy.odr.odrpack.ODR(data, model, beta0=parameters,  maxit=0)
myodr.set_job(fit_type=2)
parameterStatistics = myodr.run()
df_e = len(x) - len(parameters) # degrees of freedom, error
cov_beta = parameterStatistics.cov_beta # parameter covariance matrix from ODR
sd_beta = parameterStatistics.sd_beta * parameterStatistics.sd_beta
ci = []
t_df = scipy.stats.t.ppf(0.975, df_e)
ci = []
for i in range(len(parameters)):
    ci.append([parameters[i] - t_df * parameterStatistics.sd_beta[i], parameters[i] + t_df * parameterStatistics.sd_beta[i]])

tstat_beta = parameters / parameterStatistics.sd_beta # coeff t-statistics
pstat_beta = (1.0 - scipy.stats.t.cdf(np.abs(tstat_beta), df_e)) * 2.0    # coef. p-values

for i in range(len(parameters)):
    print('parameter:', parameters[i])
    print('   conf interval:', ci[i][0], ci[i][1])
    print('   tstat:', tstat_beta[i])
    print('   pstat:', pstat_beta[i])
    print()

print('Covariance matrix:')    
print(cov_beta)
Answered By: James Phillips

This works for when vcov() returns a 1×1 dataframe. I solved my function in Python using:

fit = scipy.optimize.minimize(fun, x0=x, method = 'L-BFGS-B')

Then, I specified the hessian inverse return value as follows:

vcov = fit['hess_inv'].todense().ravel()

This gave me the same result ~(±1e-3) as stats4::vcov() in R for scenarios where vcov() returns a 1×1 data frame.

Answered By: Jesse Arzate