Pandas – Calculate expected frequency table

Question

Consider the following dataframe:

data = [[1, 2, 3, 4], [4, 3, 2, 1]] 
df = pd.DataFrame(data, columns = ['A', 'B', 'C', 'D'])

What would be the most efficient way to generate an expected frequency table? i.e. for each cell value compute the result of (row total * column total) / (total sum)

So that the final dataframe is:

data = [[2.5, 2.5, 2.5, 2.5], [2.5, 2.5, 2.5, 2.5]] 
df = pd.DataFrame(data, columns = ['A', 'B', 'C', 'D'])

Asked By: drec4s

||

Source

Answer 1

You can use the underlying numpy array and broadcasting:

a = df.values
pd.DataFrame((a.sum(0)*a.sum(1)[:,None])/a.sum(),
             columns=df.columns, index=df.index)

output:

     A    B    C    D
0  2.5  2.5  2.5  2.5
1  2.5  2.5  2.5  2.5

Answered By: mozway

Answer 2

df.apply(lambda ss:ss.map(lambda x:ss.sum()),axis=1)*df.sum()/df.sum().sum()


out：


     A    B    C    D
0  2.5  2.5  2.5  2.5
1  2.5  2.5  2.5  2.5

Answered By: G.G

Pandas – Calculate expected frequency table

Question:

Answers: