How to incorporate an ANOVA into a "for loop" in python?

Question:

I am attempting to run an ANOVA on a number of variables from a list. However, I am having trouble letting indicating that the variable (variable ‘lst) inside the ANOVA formula actually refers to a list.

Here is what I attempted:

lst = ['Item1', 'Item2']

for item in lst:
    mod = ols('lst ~ Group', data= DF).fit()
    aov_table = sm.stats.anova_lm(mod, typ=2)
    print(aov_table)
Asked By: arkadiy

||

Answers:

If you want to accces the correspondig item in the for loop you have to format the string, using the format() method.
For example:

lst = ['Item1', 'Item2']

for item in lst:
    mod = ols('{} ~ Group'.format(item), data= DF).fit()
    aov_table = sm.stats.anova_lm(mod, typ=2)
    print(aov_table)

The formula string will be Item1 ~ Group for the first item and Item2 ~ Group for the second.

A more expansive example where you have multiple DVs and IVs that you want to run, you could do a cartesian product of the items and use them in the model, like so:

import statsmodels.api as sm
from statsmodels.formula.api import ols
import itertools

dvs = ['a', 'b']
ivs = ['d', 'e', 'f', ]

Calling iterools.product(dvs, ivs) will give you the product of the dvs and and ivs, like below:

a d
a e
a f
b d
b e
b f

So you full model will look like below:

aov = {} # collect the results in a dictionary
for dv, iv in itertools.product(dvs, ivs):
    model = ols('{} ~ C({})'.format(dv, iv), data=df).fit()
    aov[dv, iv] = sm.stats.anova_lm(model, typ=2) # use dv, iv as index

Finally, when you print aov, it will print the combo of (dv, iv) as keys to the data dict, and you can retrieve each of the results individually with

aov.get(('dv', 'iv')).style.format(precision=3) # this formats it nicely for you
Answered By: GSA
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.