Python: How to sum dict values with shared key
Question:
I have json format key value pairs need to sum only values of another key inside same set if same key.
For example,
obj=[{'A': 1, 'X': 5}, {'B' : 5, 'X': 2 },{'A': 1, 'X': 8}]
If above A key matches, I would like to sum X key values like 5+8 = 13. I’m expecting remove duplicate key of A and sum only X values finally get output like below.
obj=[{'A': 1, 'X': 13}, {'B' : 5, 'X': 2 }]
I have tried something like below, but not working.
>>> for i in range(0, len(obj)):
... for z in range(0, len(obj)):
... if obj[i] == obj[z]:
... print(obj[i]['A'])
Answers:
Convert key-value pairs to tuple
(except for "X"
), and then use that tuple
as the key in a new dict
to add up values for "X"
. After that, it’s just reformatting to get the answer.
d = dict.fromkeys(((k, v) for el in obj for k, v in el.items() if k != "X"), 0)
for k, v in d.keys():
for item in obj:
if item.get(k) and item[k] == v:
d[(k, v)] += item["X"]
ans = []
for k, v in d.items():
curr = {}
curr[k[0]] = k[1]
curr["X"] = v
ans.append(curr)
ans
# [{'A': 1, 'X': 13}, {'B': 5, 'X': 2}]
Here’s what I came up with. It sorts the list. Then uses itertools.groupby
to group by the key. Then builds a new dictionary with that group.
obj=[{'A': 1, 'X': 5}, {'B' : 5, 'X': 2 },{'A': 1, 'X': 8}]
sorted_list = sorted(obj, key=lambda x: next(iter(x.items())))
res = []
for key,group in itertools.groupby(sorted_list, key=lambda x: next(iter(x.items()))):
d = next(group).copy()
for o in group:
d['X'] += o['X']
res.append(d)
If it’s a large(ish) dataset pandas
might provide some efficiency gains and save some of the nested iteration.
For example:
- Read the
obj
list into a DataFrame
- Only the columns need to be iterated
- Create a view for each column exposing the non-null values
- Append a
dict
containing the column value and the summed 'X'
values
import pandas as pd
l = []
d = {}
df = pd.DataFrame(obj, dtype=object)
for col in df:
if col == 'X': continue
tmp = df.loc[~df[col].isnull(), [col, 'X']]
l.append({col: tmp[col].iloc[0],
'X': tmp['X'].sum()})
Output:
[{'A': 1, 'X': 13}, {'B': 5, 'X': 2}]
I have json format key value pairs need to sum only values of another key inside same set if same key.
For example,
obj=[{'A': 1, 'X': 5}, {'B' : 5, 'X': 2 },{'A': 1, 'X': 8}]
If above A key matches, I would like to sum X key values like 5+8 = 13. I’m expecting remove duplicate key of A and sum only X values finally get output like below.
obj=[{'A': 1, 'X': 13}, {'B' : 5, 'X': 2 }]
I have tried something like below, but not working.
>>> for i in range(0, len(obj)):
... for z in range(0, len(obj)):
... if obj[i] == obj[z]:
... print(obj[i]['A'])
Convert key-value pairs to tuple
(except for "X"
), and then use that tuple
as the key in a new dict
to add up values for "X"
. After that, it’s just reformatting to get the answer.
d = dict.fromkeys(((k, v) for el in obj for k, v in el.items() if k != "X"), 0)
for k, v in d.keys():
for item in obj:
if item.get(k) and item[k] == v:
d[(k, v)] += item["X"]
ans = []
for k, v in d.items():
curr = {}
curr[k[0]] = k[1]
curr["X"] = v
ans.append(curr)
ans
# [{'A': 1, 'X': 13}, {'B': 5, 'X': 2}]
Here’s what I came up with. It sorts the list. Then uses itertools.groupby
to group by the key. Then builds a new dictionary with that group.
obj=[{'A': 1, 'X': 5}, {'B' : 5, 'X': 2 },{'A': 1, 'X': 8}]
sorted_list = sorted(obj, key=lambda x: next(iter(x.items())))
res = []
for key,group in itertools.groupby(sorted_list, key=lambda x: next(iter(x.items()))):
d = next(group).copy()
for o in group:
d['X'] += o['X']
res.append(d)
If it’s a large(ish) dataset pandas
might provide some efficiency gains and save some of the nested iteration.
For example:
- Read the
obj
list into a DataFrame - Only the columns need to be iterated
- Create a view for each column exposing the non-null values
- Append a
dict
containing the column value and the summed'X'
values
import pandas as pd
l = []
d = {}
df = pd.DataFrame(obj, dtype=object)
for col in df:
if col == 'X': continue
tmp = df.loc[~df[col].isnull(), [col, 'X']]
l.append({col: tmp[col].iloc[0],
'X': tmp['X'].sum()})
Output:
[{'A': 1, 'X': 13}, {'B': 5, 'X': 2}]