Null Behavior in Pandas DataFrame Multiplication
Question:
I have a DataFrame of values to questions, vals
, and a DataFrame of the weights to those questions multiply_vals
. Each record in the vals
DataFrame corresponds to a single user.
import pandas as pd
vals = pd.DataFrame({'A1':[0,1], 'A2':[1,2], 'A3':[3,3],'A4':[4,2],'B1':[2,1]})
multiply_vals = pd.DataFrame({'Weights':[.5,.25,.75,1,.33]}, index=['A1','A2','A3','A4','B1'])
#vals
A1 A2 A3 A4 B1
0 0 1 3 4 2
1 1 2 3 2 1
#Multiply Vals
Weights
A1 0.50
A2 0.25
A3 0.75
A4 1.00
B1 0.33
I want to multiply each row in vals
by the correct weight multiply_vals
, but there seems to be some unexpected results with nulls.
Expected result:
A1 A2 A3 A4 B1
0 0.0 0.25 2.25 4 0.66
1 0.5 0.50 2.25 2 0.33
What I tried:
I tried using mul
/multiply
as well as combining it with transpose
/T
but it returns nulls.
vals.mul(multiply_vals.T, axis=1)
A1 A2 A3 A4 B1
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
Weights NaN NaN NaN NaN NaN
Unexpected Behavior:
if I take the exact same but use .values
it works.
vals.mul(multiply_vals.T.values, axis=1)
A1 A2 A3 A4 B1
0 0.0 0.25 2.25 4.0 0.66
1 0.5 0.50 2.25 2.0 0.33
Why does .values
work?
Using pandas version '0.25.0'
Answers:
Define the second one as a Series
as it is only one column, then multiply by its transpose:
import pandas as pd
vals = pd.DataFrame({'A1':[0,1], 'A2':[1,2], 'A3':[3,3],'A4':[4,2],'B1':[2,1]})
multiply_vals = pd.Series([.5,.25,.75,1,.33], index=['A1','A2','A3','A4','B1'])
vals*multiply_vals.T
A1 A2 A3 A4 B1
0 0.0 0.25 2.25 4.0 0.66
1 0.5 0.50 2.25 2.0 0.33
You just need the values from multiply vals
vals * multiply_vals.values.T
A1 A2 A3 A4 B1
0 0.0 0.25 2.25 4.0 0.66
1 0.5 0.50 2.25 2.0 0.33
Try this code:
import pandas as pd
vals = pd.DataFrame({'A1':[0,1], 'A2':[1,2], 'A3':[3,3],'A4':[4,2],'B1':[2,1]})
multiply_vals = pd.DataFrame({'Weights':[.5,.25,.75,1,.33]}, index=['A1','A2','A3','A4','B1'])
vals2 = vals.transpose()
vals2.columns =['0', '1']
df_join = pd.merge(vals2, multiply_vals, left_index=True, right_index=True)
df_join['0 weighted'] = df_join['0']*df_join['Weights']
df_join['1 weighted'] = df_join['1']*df_join['Weights']
df_final = df_join[['0 weighted', '1 weighted']]
df_final = df_final.transpose()
df_final.head()
The reason DataFrame.mul and DataFrame.multiply don’t work as expected is that they are referencing the names of the columns and rows to do elementwise operations. This is very useful for other purposes.
Converting to a Series with vals.mul(multiply_vals.T.values, axis=1)
or vals * multiply_vals.values.T
solves the original problem.
However, if you want to make DataFrame.mul to work, you could do this:
Starting with the same DataFrames…
vals = pd.DataFrame({'A1':[0,1], 'A2':[1,2], 'A3':[3,3],'A4':[4,2],'B1':[2,1]})
multiply_vals = pd.DataFrame({'Weights':[.5,.25,.75,1,.33]}, index=['A1','A2','A3','A4','B1'])
We need to reshape multiply_vals
to match the expected shape.
# copying the rows, in a somewhat silly exercise
multiply_vals_reshaped = pd.concat([multiply_vals.T, multiply_vals.T], axis=0)
# matching the index of vals
multiply_vals_reshaped.reset_index(drop=True, inplace=True)
#multiply_vals_reshaped
A1 A2 A3 A4 B1
0 0.5 0.25 0.75 1.0 0.33
1 0.5 0.25 0.75 1.0 0.33
vals.mul(multiply_vals_reshaped)
now behaves as expected:
A1 A2 A3 A4 B1
0 0.0 0.25 2.25 4.0 0.66
1 0.5 0.50 2.25 2.0 0.33
I have a DataFrame of values to questions, vals
, and a DataFrame of the weights to those questions multiply_vals
. Each record in the vals
DataFrame corresponds to a single user.
import pandas as pd
vals = pd.DataFrame({'A1':[0,1], 'A2':[1,2], 'A3':[3,3],'A4':[4,2],'B1':[2,1]})
multiply_vals = pd.DataFrame({'Weights':[.5,.25,.75,1,.33]}, index=['A1','A2','A3','A4','B1'])
#vals
A1 A2 A3 A4 B1
0 0 1 3 4 2
1 1 2 3 2 1
#Multiply Vals
Weights
A1 0.50
A2 0.25
A3 0.75
A4 1.00
B1 0.33
I want to multiply each row in vals
by the correct weight multiply_vals
, but there seems to be some unexpected results with nulls.
Expected result:
A1 A2 A3 A4 B1
0 0.0 0.25 2.25 4 0.66
1 0.5 0.50 2.25 2 0.33
What I tried:
I tried using mul
/multiply
as well as combining it with transpose
/T
but it returns nulls.
vals.mul(multiply_vals.T, axis=1)
A1 A2 A3 A4 B1
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
Weights NaN NaN NaN NaN NaN
Unexpected Behavior:
if I take the exact same but use .values
it works.
vals.mul(multiply_vals.T.values, axis=1)
A1 A2 A3 A4 B1
0 0.0 0.25 2.25 4.0 0.66
1 0.5 0.50 2.25 2.0 0.33
Why does .values
work?
Using pandas version '0.25.0'
Define the second one as a Series
as it is only one column, then multiply by its transpose:
import pandas as pd
vals = pd.DataFrame({'A1':[0,1], 'A2':[1,2], 'A3':[3,3],'A4':[4,2],'B1':[2,1]})
multiply_vals = pd.Series([.5,.25,.75,1,.33], index=['A1','A2','A3','A4','B1'])
vals*multiply_vals.T
A1 A2 A3 A4 B1
0 0.0 0.25 2.25 4.0 0.66
1 0.5 0.50 2.25 2.0 0.33
You just need the values from multiply vals
vals * multiply_vals.values.T
A1 A2 A3 A4 B1
0 0.0 0.25 2.25 4.0 0.66
1 0.5 0.50 2.25 2.0 0.33
Try this code:
import pandas as pd
vals = pd.DataFrame({'A1':[0,1], 'A2':[1,2], 'A3':[3,3],'A4':[4,2],'B1':[2,1]})
multiply_vals = pd.DataFrame({'Weights':[.5,.25,.75,1,.33]}, index=['A1','A2','A3','A4','B1'])
vals2 = vals.transpose()
vals2.columns =['0', '1']
df_join = pd.merge(vals2, multiply_vals, left_index=True, right_index=True)
df_join['0 weighted'] = df_join['0']*df_join['Weights']
df_join['1 weighted'] = df_join['1']*df_join['Weights']
df_final = df_join[['0 weighted', '1 weighted']]
df_final = df_final.transpose()
df_final.head()
The reason DataFrame.mul and DataFrame.multiply don’t work as expected is that they are referencing the names of the columns and rows to do elementwise operations. This is very useful for other purposes.
Converting to a Series with vals.mul(multiply_vals.T.values, axis=1)
or vals * multiply_vals.values.T
solves the original problem.
However, if you want to make DataFrame.mul to work, you could do this:
Starting with the same DataFrames…
vals = pd.DataFrame({'A1':[0,1], 'A2':[1,2], 'A3':[3,3],'A4':[4,2],'B1':[2,1]})
multiply_vals = pd.DataFrame({'Weights':[.5,.25,.75,1,.33]}, index=['A1','A2','A3','A4','B1'])
We need to reshape multiply_vals
to match the expected shape.
# copying the rows, in a somewhat silly exercise
multiply_vals_reshaped = pd.concat([multiply_vals.T, multiply_vals.T], axis=0)
# matching the index of vals
multiply_vals_reshaped.reset_index(drop=True, inplace=True)
#multiply_vals_reshaped
A1 A2 A3 A4 B1
0 0.5 0.25 0.75 1.0 0.33
1 0.5 0.25 0.75 1.0 0.33
vals.mul(multiply_vals_reshaped)
now behaves as expected:
A1 A2 A3 A4 B1
0 0.0 0.25 2.25 4.0 0.66
1 0.5 0.50 2.25 2.0 0.33