Pandas – Calculate expected frequency table

Question:

Consider the following dataframe:

data = [[1, 2, 3, 4], [4, 3, 2, 1]] 
df = pd.DataFrame(data, columns = ['A', 'B', 'C', 'D'])

What would be the most efficient way to generate an expected frequency table? i.e. for each cell value compute the result of (row total * column total) / (total sum)

So that the final dataframe is:

data = [[2.5, 2.5, 2.5, 2.5], [2.5, 2.5, 2.5, 2.5]] 
df = pd.DataFrame(data, columns = ['A', 'B', 'C', 'D'])
Asked By: drec4s

||

Answers:

You can use the underlying numpy array and broadcasting:

a = df.values
pd.DataFrame((a.sum(0)*a.sum(1)[:,None])/a.sum(),
             columns=df.columns, index=df.index)

output:

     A    B    C    D
0  2.5  2.5  2.5  2.5
1  2.5  2.5  2.5  2.5
Answered By: mozway
df.apply(lambda ss:ss.map(lambda x:ss.sum()),axis=1)*df.sum()/df.sum().sum()


out:


     A    B    C    D
0  2.5  2.5  2.5  2.5
1  2.5  2.5  2.5  2.5
Answered By: G.G
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.