Python/ Pandas: Making a contingency table with multiple variables

Question:

My dataframe has 4 columns (one dependent variable and 3 independent).

Here’s a sample:

Sample data

My desired output is a contingency table, as follows:

Desired output

I can only seem to get a contingency table using one independent variable- using the following code (my df is called ‘table’)

pd.crosstab(index=table['Dvar'],columns=table['Var1'])

I can’t seem to be able to add any other variables to this…Is the only way to achieve this to do make a separate contingency table for each var (1 to 3) and then merge/ join them?

Asked By: YoungboyVBA

||

Answers:

This is not a good use case for crosstab as you already have your contingency table (just not aggregated), rather use a groupby.sum

df = pd.DataFrame([[1,0,0,0],
                   [1,1,1,0],
                   [0,1,1,1]], columns=['Var1', 'Var2', 'Var3', 'Dvar'])

out = df.groupby('Dvar', as_index=False).sum()

output:

   Dvar  Var1  Var2  Var3
0     0     2     1     1
1     1     0     1     1
Answered By: mozway

First of all, contingency table is for showing correlation between features.

If you want to probably see correlation between independent and dependent features, go through this code:

pd.crosstab([table['Var1'],table['Var2'],table['Var3']],
            table['Dvar'], margins = False)

But, as you mention, to get your desired output for that use pandas.DataFrame.groupby statement as:

table.groupby('Dvar').sum()
Answered By: sameer aggarwal
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.