# Weighted average of a dictionary – Pandas

## Question:

I have the following column in a data-frame (it is an example):

First row is: `'{"100":10,"50":3,"-90":2}'`.

Second row is: `'{"100":70,"50":3,"-90":2,"-40":3}'`.

I want to calculate a weighted average where the dictionary’s keys are the values and the dictionary’s values are the weights of the weighted average.

The final value of the first row should be: `64.666`, which is `(100*10+50*3-90*2)/(10+3+2)`; and the of the second row should be: `87.82`.

For each dictionary there might be hundreds of keys/values and the column might have thousands of rows. How can I code it efficiently? Preferably vectorially.

Use regular expressions to grab all the keys and values, explode it into a different dataframe and then make the calculations.

``````df = pd.DataFrame({'col1' : ['{"100":10,"50":3,"-90":2}', '{"100":70,"50":3,"-90":2,"-40":3}']})

df2 = df.col1.str.findall('"(?P<key>-?d*)":(?P<value>-?d*)').to_frame()
df2 = df2.explode('col1')

df2[['value', 'weight']] = [(int(a), int(b)) for a,b in df2.col1.to_list()]
df2['prod'] = df2.value*df2.weight

df['weighted_avg'] = df2.groupby(level = 0)['prod'].sum() / df2.groupby(level = 0)['weight'].sum()
``````

You can use `json.loads` and `pandas.Series.apply`.

``````import json

def cal_avg(dct):
return sum(int(k)*v for k,v in dct.items()) / sum(dct[k] for k in dct)

df['dct'].apply(cal_avg)
``````

Output:

``````0    64.666667
1    87.820513
Name: dct, dtype: float64
``````

Input DataFrame:

``````import pandas as pd
df = pd.DataFrame({
'dct': [
'{"100":10,"50":3,"-90":2}',
'{"100":70,"50":3,"-90":2,"-40":3}'
]
})
``````

try this easy for loop:

``````dataset = [{"100": 10, "50": 3, "-90": 2}, {"100": 70, "50": 3, "-90": 2, "-40": 3}]

for data in dataset:
weighted_value = 0
weight = 0
for item in data.items():
weighted_value += int(item) * item
weight += item
weighted_average = weighted_value / weight
print(weighted_average)
``````

performance wise i dont think this can be optimized a lot further.

Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.