How can I code Vuong's statistical test in Python?

Question:

I need to implement Vuong’s test for non-nested models. Specifically, I have logistic-regression models that I would like to compare.

I have found implementations in R and STATA online, but unfortunately I work in Python and am not familiar with those frameworks/languages. Also unfortunate is that I have not been unable to find a Python implementation of the test (if someone knows of one that would be fantastic!). I asked on Cross Validated and was redirected to here.

After scouring the original paper and a few other pages that I found (references below) I think I’ve managed to port the R implementation from the pscl. Example code is below.

Is there someone who is fluent in both Python and R who could help confirm that the below code reproduces the R implementation? And, if not, help determine where I erred? In addition to helping with my immediate problem, perhaps this can help someone in the future.

References consulted:

Example code with current implementation and use for logistic regression fits of dummy data. Note that this is a stock dataset and the predictor variables were arbitrarily selected.

import numpy as np
import pandas as pd
import statsmodels.api as sm
from scipy.stats import norm
from sklearn.datasets import load_breast_cancer

def vuong_test(mod1, mod2, correction=True):
    '''
    mod1, mod2 - non-nested logitstic regression fit results from statsmodels
    '''
    # number of observations and check of models
    N = mod1.nobs
    N2 = mod2.nobs
    if N != N2:
        raise ValueError('Models do not have the same number of observations')
    # extract the log-likelihood for individual points with the models
    m1 = mod1.model.loglikeobs(mod1.params)
    m2 = mod2.model.loglikeobs(mod2.params)
    # point-wise log likelihood ratio
    m = m1 - m2
    # calculate the LR statistic
    LR = np.sum(m)
    # calculate the AIC and BIC correction factors -> these go to zero when df is same between models
    AICcor = mod1.df_model - mod2.df_model
    BICcor = np.log(N)*AICcor/2
    # calculate the omega^2 term
    omega2 = np.var(m)
    # calculate the Z statistic with and without corrections
    Zs = np.array([LR,LR-AICcor,LR-BICcor])
    Zs /= np.sqrt(N*omega2)
    # calculate the p-value
    ps = []
    msgs = []
    for Z in Zs:
        if Z>0:
            ps.append(1 - norm.cdf(Z))
            msgs.append('model 1 preferred over model 2')
        else:
            ps.append(norm.cdf(Z))
            msgs.append('model 2 preferred over model 1')
    # share information
    print('=== Vuong Test Results ===')
    labs = ['Uncorrected']
    if AICcor!=0:
        labs += ['AIC Corrected','BIC Corrected']
    for lab,msg,p,Z in zip(labs,msgs,ps,Zs):
        print('  -> '+lab)
        print('    -> '+msg)
        print('    -> Z: '+str(Z))
        print('    -> p: '+str(p))

# load sample data
X,y = load_breast_cancer( return_X_y=True, as_frame=True)
# create data for modeling
X1 = sm.add_constant( X.loc[:,('mean radius','perimeter error','worst symmetry')])
X2 = sm.add_constant( X.loc[:,('mean area','worst smoothness')])
# fit the models
mod1 = sm.Logit( y, X1).fit()
mod2 = sm.Logit( y, X2).fit()

# run Vuong's Test
vuong_test( mod1, mod2)

Output

Optimization terminated successfully.
         Current function value: 0.172071
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.156130
         Iterations 9
=== Vuong Test Results ===
  -> Uncorrected
    -> model 2 preferred over model 1
    -> Z: -0.7859538767172318
    -> p: 0.21594725450542168
  -> AIC Corrected
    -> model 2 preferred over model 1
    -> Z: -0.8726045081599004
    -> p: 0.19143934123980472
  -> BIC Corrected
    -> model 2 preferred over model 1
    -> Z: -1.0608044994241501
    -> p: 0.14438937850119132
Asked By: HeorotsHero

||

Answers:

In order to move forward, I learned enough R to be able to make my own comparison between R and python for Vuong’s Test as below.

In the end, my original implementation was close except for the subtle point of difference between how numpy and R calculate standard deviations by default. After correcting to now calculate the sample standard deviation (by changing np.var(m) to np.var(m,ddof=1)), I get a match between pscl and my own python code.

Updated Python implementation

import numpy as np
import pandas as pd
import statsmodels.api as sm
from scipy.stats import norm
from sklearn.datasets import load_breast_cancer

def vuong_test(mod1, mod2, correction=True):
    '''
    mod1, mod2 - non-nested logitstic regression fit results from statsmodels
    '''
    # number of observations and check of models
    N = mod1.nobs
    N2 = mod2.nobs
    if N != N2:
        raise ValueError('Models do not have the same number of observations')
    # extract the log-likelihood for individual points with the models
    m1 = mod1.model.loglikeobs(mod1.params)
    m2 = mod2.model.loglikeobs(mod2.params)
    # point-wise log likelihood ratio
    m = m1 - m2
    # calculate the LR statistic
    LR = np.sum(m)
    # calculate the AIC and BIC correction factors -> these go to zero when df is same between models
    AICcor = mod1.df_model - mod2.df_model
    BICcor = np.log(N)*AICcor/2
    # calculate the omega^2 term
    omega2 = np.var(m, ddof=1)
    # calculate the Z statistic with and without corrections
    Zs = np.array([LR,LR-AICcor,LR-BICcor])
    Zs /= np.sqrt(N*omega2)
    # calculate the p-value
    ps = []
    msgs = []
    for Z in Zs:
        if Z>0:
            ps.append(1 - norm.cdf(Z))
            msgs.append('model 1 preferred over model 2')
        else:
            ps.append(norm.cdf(Z))
            msgs.append('model 2 preferred over model 1')
    # share information
    print('=== Vuong Test Results ===')
    labs = ['Uncorrected']
    if AICcor!=0:
        labs += ['AIC Corrected','BIC Corrected']
    for lab,msg,p,Z in zip(labs,msgs,ps,Zs):
        print('  -> '+lab)
        print('    -> '+msg)
        print('    -> Z: '+str(Z))
        print('    -> p: '+str(p))

# load sample data
X,y = load_breast_cancer( return_X_y=True, as_frame=True)
# create data for modeling
X1 = sm.add_constant( X.loc[:,('mean radius','perimeter error','worst symmetry')])
X2 = sm.add_constant( X.loc[:,('mean area','worst smoothness')])
# fit the models
mod1 = sm.Logit( y, X1).fit()
mod2 = sm.Logit( y, X2).fit()

# run Vuong's Test
vuong_test( mod1, mod2)

# save data for R test function
pd.concat( [X,y], axis=1).to_csv('breast_cancer_data.csv')

With output

Optimization terminated successfully.
         Current function value: 0.172071
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.156130
         Iterations 9
=== Vuong Test Results ===
  -> Uncorrected
    -> model 2 preferred over model 1
    -> Z: -0.7852629281206245
    -> p: 0.21614971334597216
  -> AIC Corrected
    -> model 2 preferred over model 1
    -> Z: -0.8718373831692781
    -> p: 0.19164854872682913
  -> BIC Corrected
    -> model 2 preferred over model 1
    -> Z: -1.0598719238597758
    -> p: 0.1446014350740505

Comparison R code

# load library
library(pscl)

# load breast cancer data set
bcdata <- read.csv('breast_cancer_data.csv')

# build logistic regression models
mod1 <- glm( target ~ mean.radius + perimeter.error + worst.symmetry, data=bcdata, family="binomial")
mod2 <- glm( target ~ mean.area + worst.smoothness, data=bcdata, family="binomial")

# compare the models
vuong( mod1, mod2)

With output

Vuong Non-Nested Hypothesis Test-Statistic:
(test-statistic is asymptotically distributed N(0,1) under the
 null that the models are indistinguishible)
-------------------------------------------------------------
              Vuong z-statistic             H_A p-value
Raw                  -0.7852629 model2 > model1 0.21615
AIC-corrected        -0.8718374 model2 > model1 0.19165
BIC-corrected        -1.0598719 model2 > model1 0.14460
Warning messages:
1: glm.fit: fitted probabilities numerically 0 or 1 occurred
2: glm.fit: fitted probabilities numerically 0 or 1 occurred
Answered By: HeorotsHero