Element-wise weighted average of multiple dataframes
Question:
Let’s say we have 3 dataframes (df1, df2, df3). I know I can get an element-wise average of the three dataframes with
list_of_dfs = [df1, df2, df3]
sum(list_of_dfs)/len(list_of_dfs)
But I need to get a weighted average of the three dataframes, with weights defined in an array "W"
W = np.array([0.2, 0.3, 0.5])
So df1 will get a 20% weight, df2 30% and df3 50%.
Unfortunately the actual number of dataframes is much larger than 3, otherwise I could do simply the follwing:
df1*W[0] + df2*W[1] + df3*W[2]
Any help? Thanks
Answers:
You can get the sum with sum
and zip
:
sum(w*d for w, d in zip(W, list_of_dfs))
For the average, divide by the sum of weights if it’s not already equal to 1
:
sum(w*d for w, d in zip(W, list_of_dfs))/sum(W)
Or, with numpy (assuming the DataFrames are aligned):
out = pd.DataFrame(np.average(np.dstack(list_of_dfs), axis=2, weights=W),
index=df1.index, columns=df1.columns)
Example output (weighted average):
0 1 2 3 4
0 1.9 2.0 4.0 4.6 4.4
1 6.4 3.5 3.9 2.7 6.2
2 6.3 2.6 4.2 5.1 5.0
3 8.4 5.6 4.4 6.3 3.3
4 1.9 3.9 6.2 6.9 2.8
Used input:
np.random.seed(0)
df1 = pd.DataFrame(np.random.randint(0, 10, size=(5, 5)))
df2 = pd.DataFrame(np.random.randint(0, 10, size=(5, 5)))
df3 = pd.DataFrame(np.random.randint(0, 10, size=(5, 5)))
Let’s say we have 3 dataframes (df1, df2, df3). I know I can get an element-wise average of the three dataframes with
list_of_dfs = [df1, df2, df3]
sum(list_of_dfs)/len(list_of_dfs)
But I need to get a weighted average of the three dataframes, with weights defined in an array "W"
W = np.array([0.2, 0.3, 0.5])
So df1 will get a 20% weight, df2 30% and df3 50%.
Unfortunately the actual number of dataframes is much larger than 3, otherwise I could do simply the follwing:
df1*W[0] + df2*W[1] + df3*W[2]
Any help? Thanks
You can get the sum with sum
and zip
:
sum(w*d for w, d in zip(W, list_of_dfs))
For the average, divide by the sum of weights if it’s not already equal to 1
:
sum(w*d for w, d in zip(W, list_of_dfs))/sum(W)
Or, with numpy (assuming the DataFrames are aligned):
out = pd.DataFrame(np.average(np.dstack(list_of_dfs), axis=2, weights=W),
index=df1.index, columns=df1.columns)
Example output (weighted average):
0 1 2 3 4
0 1.9 2.0 4.0 4.6 4.4
1 6.4 3.5 3.9 2.7 6.2
2 6.3 2.6 4.2 5.1 5.0
3 8.4 5.6 4.4 6.3 3.3
4 1.9 3.9 6.2 6.9 2.8
Used input:
np.random.seed(0)
df1 = pd.DataFrame(np.random.randint(0, 10, size=(5, 5)))
df2 = pd.DataFrame(np.random.randint(0, 10, size=(5, 5)))
df3 = pd.DataFrame(np.random.randint(0, 10, size=(5, 5)))