Why does Python say that a value does not exist when it specifically does?

Question:

SHORT DESCRIPTION:

The Main issue is that whenever i run the following code, i get the error below that:

import statsmodels.api as sm
from statsmodels.formula.api import ols    
def onewayanaova (csv, vars, x="x-axis", y="y-axis"):
        df = pd.read_csv(csv, delimiter=",") 
        df_melt = pd.melt(df.reset_index(), id_vars=['index'], value_vars=vars)
        df_melt.columns = ['index', {x}, {y}]
        model = ols(f'{y} ~ C({x})', data=df_melt).fit()
        anova_table = sm.stats.anova_lm(model, typ=2)
        print("The One-Way Anova Test Values are:n")
        print(anova_table)
onewayanaova("Book1.csv", ["a","b","c"])

The error is:

Traceback (most recent call last):
  File "pandas_libshashtable_class_helper.pxi", line 5231, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'set'
Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
  File "pandas_libshashtable_class_helper.pxi", line 5231, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'set'
Traceback (most recent call last):
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsycompat.py", line 36, in call_and_wrap_exc
    return f(*args, **kwargs)
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsyeval.py", line 165, in eval
    return eval(code, {}, VarLookupDict([inner_namespace]
  File "<string>", line 1, in <module>
NameError: name 'axis' is not defined

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:UsersmghafDesktopPython CodesReMan Edutest.py", line 3, in <module>
    mn.onewayanaova("Book1.csv", ["a","b","c"])
  File "c:UsersmghafDesktopPython CodesReMan Edumaincode.py", line 154, in onewayanaova
    model = ols(f'{y} ~ C({x})', data=df_melt).fit()
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagesstatsmodelsbasemodel.py", line 200, in from_formula
    tmp = handle_formula_data(data, None, formula, depth=eval_env,
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagesstatsmodelsformulaformulatools.py", line 63, in handle_formula_data
    result = dmatrices(formula, Y, depth, return_type='dataframe',
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsyhighlevel.py", line 309, in dmatrices
    (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsyhighlevel.py", line 164, in _do_highlevel_design
    design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsyhighlevel.py", line 66, in _try_incr_builders
    return design_matrix_builders([formula_like.lhs_termlist,
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsybuild.py", line 693, in design_matrix_builders
    cat_levels_contrasts) = _examine_factor_types(all_factors,
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsybuild.py", line 443, in _examine_factor_types
    value = factor.eval(factor_states[factor], data)
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsyeval.py", line 564, in eval
    return self._eval(memorize_state["eval_code"],
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsyeval.py", line 547, in _eval
    return call_and_wrap_exc("Error evaluating factor",
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsycompat.py", line 43, in call_and_wrap_exc
    exec("raise new_exc from e")
  File "<string>", line 1, in <module>
patsy.PatsyError: Error evaluating factor: NameError: name 'axis' is not defined
    y-axis ~ C(x-axis)
             ^^^^^^^^^

I think it is the X and Y variables I set in def onewayanaova (csv, vars, x="x-axis", y="y-axis"):. Maybe I need to change that so I don’t get the error?

If you want a more detailed description, read below.

LONG DESCRIPTION:

I am trying to do a One Way Anova test. However, the main issue is that python keeps saying that there is a NameError, and that one of my values are not defined.

I am running the following code:

import statsmodels.api as sm
from statsmodels.formula.api import ols    
def onewayanaova (csv, vars, x="x-axis", y="y-axis"):
        df = pd.read_csv(csv, delimiter=",") 
        df_melt = pd.melt(df.reset_index(), id_vars=['index'], value_vars=vars)
        df_melt.columns = ['index', {x}, {y}]
        model = ols(f'{y} ~ C({x})', data=df_melt).fit()
        anova_table = sm.stats.anova_lm(model, typ=2)
        print("The One-Way Anova Test Values are:n")
        print(anova_table)

And:

import maincode as mn
mn.onewayanaova("Book1.csv", ["a","b","c"])

I get the following error (The first code is saved to a file named manicode.py, and the second code is saved to a file named test.py. "Book1.csv" is in the same folder as them). The error is:

Traceback (most recent call last):
  File "pandas_libshashtable_class_helper.pxi", line 5231, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'set'
Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
  File "pandas_libshashtable_class_helper.pxi", line 5231, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'set'
Traceback (most recent call last):
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsycompat.py", line 36, in call_and_wrap_exc
    return f(*args, **kwargs)
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsyeval.py", line 165, in eval
    return eval(code, {}, VarLookupDict([inner_namespace]
  File "<string>", line 1, in <module>
NameError: name 'axis' is not defined

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:UsersmghafDesktopPython CodesReMan Edutest.py", line 3, in <module>
    mn.onewayanaova("Book1.csv", ["a","b","c"])
  File "c:UsersmghafDesktopPython CodesReMan Edumaincode.py", line 154, in onewayanaova
    model = ols(f'{y} ~ C({x})', data=df_melt).fit()
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagesstatsmodelsbasemodel.py", line 200, in from_formula
    tmp = handle_formula_data(data, None, formula, depth=eval_env,
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagesstatsmodelsformulaformulatools.py", line 63, in handle_formula_data
    result = dmatrices(formula, Y, depth, return_type='dataframe',
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsyhighlevel.py", line 309, in dmatrices
    (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsyhighlevel.py", line 164, in _do_highlevel_design
    design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsyhighlevel.py", line 66, in _try_incr_builders
    return design_matrix_builders([formula_like.lhs_termlist,
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsybuild.py", line 693, in design_matrix_builders
    cat_levels_contrasts) = _examine_factor_types(all_factors,
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsybuild.py", line 443, in _examine_factor_types
    value = factor.eval(factor_states[factor], data)
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsyeval.py", line 564, in eval
    return self._eval(memorize_state["eval_code"],
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsyeval.py", line 547, in _eval
    return call_and_wrap_exc("Error evaluating factor",
  File "C:UsersmghafAppDataLocalPackagesPythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0LocalCachelocal-packagesPython39site-packagespatsycompat.py", line 43, in call_and_wrap_exc
    exec("raise new_exc from e")
  File "<string>", line 1, in <module>
patsy.PatsyError: Error evaluating factor: NameError: name 'axis' is not defined
    y-axis ~ C(x-axis)
             ^^^^^^^^^

The main error that I see is that I named the X and Y variables as: x="x-axis", y="y-axis". But i do not get why that gives me an error, as I made a very neat looking boxplot from it (but I know that X and Y are used as the axis titles):

def boxplot (csv, vars, x="x-axis", y="y-axis"):
    #https://www.reneshbedre.com/blog/anova.html
    df = pd.read_csv(csv, delimiter=",") 
    df_melt = pd.melt(df.reset_index(), id_vars=['index'], value_vars=vars)
    df_melt.columns = ['index', x, y]
    ax = sns.boxplot(x=x, y=y, data=df_melt, color='#99c2a2')
    ax = sns.swarmplot(x=x, y=y, data=df_melt, color='#7d0013')
    plt.show()

BUT, whenever I write this code from someone else, it gives the output I want:

import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd
df = pd.read_csv("https://reneshbedre.github.io/assets/posts/anova/onewayanova.txt", sep="t")
df_melt = pd.melt(df.reset_index(), id_vars=['index'], value_vars=['A', 'B', 'C', 'D'])
df_melt.columns = ['index', 'treatments', 'value']
model = ols('value ~ C(treatments)', data=df_melt).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

The output that i get with the above code:

                sum_sq    df         F    PR(>F)
C(treatments)  3010.95   3.0  17.49281  0.000026
Residual        918.00  16.0       NaN       NaN

The main issue is that i need to change values of model = ols('value ~ C(treatments)', data=df_melt).fit() and df_melt.columns = ['index', 'treatments', 'value'] because most datasets do not have 'treatments', 'value' as their database. If your wondering what my .csv file has is this:

  1. Column headers of a, b and c
  2. A list of equal amount of numbers in each of them

My main issue is:

Please try and help me understand why I cannot replace 'value ~ C(treatments)' with X and Y!

Source of the code: https://www.reneshbedre.com/blog/anova.html

Asked By: Alireza Ghaffarian

||

Answers:

In statsmodels formulae, you need to quote your variables (i.e. columns in your dataframe) when they contain special characters such as -. Have a look at the documentation, your term "x-axis" is interpreted as "x" – "axis". Quoting variable can be done with the Q() transformation. Make sure to quote the variable name inside with different (single/double) quotes that you use for the string:

model = ols(f'Q("{y}") ~ C(Q("{x}"))', data=df_melt).fit()
Answered By: Rob

It seems that model = ols('value ~ C(treatments)', data=df_melt).fit() cannot have a variable subsitute (as i had in model = ols(f'{y} ~ C({x})', data=df_melt).fit()). This is also the case if i use model = ols(f'Q("{y}") ~ C(Q("{x}"))', data=df_melt).fit(), as mentioned by @Rob.

Therefore, to make it work and have my own names, i just have to rename df_melt.columns = ['index', 'treatments', 'value'] in relation to model = ols('value ~ C(treatments)', data=df_melt).fit() (where 'treatments', 'value' are the same thing in teh two lines of code).

Answered By: Alireza Ghaffarian

Why the above code is not working for model = ols('value ~ A(treatments)', data=df_melt).fit()?

Answered By: Manya
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.