Improving performance of Cronbach Alpha code python numpy

Question:

I made some code for calculating Cronbach Alpha that works. But I am not too good using lambda functions. Is there a way to reduce the code and improve efficiency by using lambda instead of the svar() function and getting rid of some of the for loops by using numpy arrays?

import numpy as np


def svar(X):
    n = float(len(X))
    svar=(sum([(x-np.mean(X))**2 for x in X]) / n)* n/(n-1.)
    return svar


def CronbachAlpha(itemscores):
    itemvars = [svar(item) for item in itemscores]
    tscores = [0] * len(itemscores[0])
    for item in itemscores:
       for i in range(len(item)):
          tscores[i]+= item[i]
    nitems = len(itemscores)
    #print "total scores=", tscores, 'number of items=', nitems

    Calpha=nitems/(nitems-1.) * (1-sum(itemvars)/ svar(tscores))

    return Calpha

###########Test################
itemscores = [[ 4,14,3,3,23,4,52,3,33,3],
              [ 5,14,4,3,24,5,55,4,15,3]]
print "Cronbach alpha = ", CronbachAlpha(itemscores)
Asked By: user3084006

||

Answers:

def CronbachAlpha(itemscores):
    itemscores = numpy.asarray(itemscores)
    itemvars = itemscores.var(axis=1, ddof=1)
    tscores = itemscores.sum(axis=0)
    nitems = len(itemscores)

    return nitems / (nitems-1.) * (1 - itemvars.sum() / tscores.var(ddof=1))

NumPy has a variance function built in. Specifying ddof=1 uses a denominator of N-1, giving a sample variance. There’s also a sum builtin.

Answered By: user2357112

As Julien Marrec mentioned I suggest the following refactoring of the CronbachAlpha:

def CronbachAlpha(itemscores):
    # cols are items, rows are observations
    itemscores = np.asarray(itemscores)
    itemvars = itemscores.var(axis=0, ddof=1)
    tscores = itemscores.sum(axis=1)
    nitems = len(itemscores.columns)

    return (nitems / (nitems-1)) * (1 - (itemvars.sum() / tscores.var(ddof=1)))
Answered By: Oskar_U

Same as the other answers, just a bit more Pythonic. X is a data matrix — that is, the rows are samples, the columns are items. X may be a numpy array or pandas DataFrame.

def cronbach_alpha(X):
    num_items = X.shape[1]
    sum_of_item_variances = X.var(axis=0).sum()
    variance_of_sum_of_items = X.sum(axis=1).var()
    return num_items/(num_items - 1)*(1 - sum_of_item_variances/variance_of_sum_of_items)

(It’s not necessary to specify ddof, as the term appears in the denominator and numerator, and cancels.)

Answered By: Denziloe
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.