Grouping an array of objects by key in python
Question:
Suppose I have an array of objects.
arr = [
{'grade': 'A', 'name': 'James'},
{'grade': 'B', 'name': 'Tom'},
{'grade': 'A', 'name': 'Zelda'}
]
I want this result
{
'A': [
{'grade': 'A', 'name': 'James'},
{'grade': 'A', 'name': 'Zelda'}
],
'B': [ {'grade': 'B', 'name': 'Tom'} ]
}
Answers:
Using dict.setdefault
we can do this:
import json
gradeList = [
{"grade": 'A', "name": 'James'},
{"grade": 'B', "name": 'Tom'},
{"grade": 'A', "name": 'Zelda'}
]
gradeDict = {}
for d in gradeList:
gradeDict.setdefault(d["grade"], []).append(d)
print(json.dumps(gradeDict, indent=4))
Output:
{
"A": [
{
"grade": "A",
"name": "James"
},
{
"grade": "A",
"name": "Zelda"
}
],
"B": [
{
"grade": "B",
"name": "Tom"
}
]
}
Use a dict and setdefault
:
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
arr2 = {}
for d in arr:
t = arr2.setdefault(d['grade'], [])
t.append(d)
>>> arr2
{'A': [{'grade': 'A', 'name': 'James'}, {'grade': 'A', 'name': 'Zelda'}],
'B': [{'grade': 'B', 'name': 'Tom'}]}
I would use a pd.Dataframe and do it like this:
import pandas as pd
df = pd.Dataframe(arr)
for index, group in df.groupby('grade'):
print(group)
Instead of print(group) you can write the data to whatever you need it, I suppose it is not necessarily a dict like you described.
I would do a simple loop like this:
arr = [{'grade': 'A', 'name': 'James'}, {'grade': 'B', 'name': 'Tom'}, {'grade': 'A', 'name': 'Zelda'}]
grouped_grades = {}
for item in arr:
if item['grade'] not in grouped_grades:
grouped_grades[item['grade']] = []
grouped_grades[item['grade']].append(item)
print(grouped_grades)
Output:
{'A': [{'grade': 'A', 'name': 'James'}, {'grade': 'A', 'name': 'Zelda'}], 'B': [{'grade': 'B', 'name': 'Tom'}]}
I think that the easiest way is to use defaultdict. Then you could convert the result back into an ordinary dict if you need to by simply passing it in the constructor like dict(output)
.
from collections import defaultdict
output = defaultdict(lambda: [])
for item in arr:
output[item['grade']].append(item)
You can use itertools.groupby
>>> keyfunc = lambda item: item['grade']
>>> {k:list(v) for k,v in itertools.groupby( sorted(arr,key=keyfunc) , keyfunc) }
{'A': [{'grade': 'A', 'name': 'James'}, {'grade': 'A', 'name': 'Zelda'}], 'B': [{'grade': 'B', 'name': 'Tom'}]}
Suppose I have an array of objects.
arr = [
{'grade': 'A', 'name': 'James'},
{'grade': 'B', 'name': 'Tom'},
{'grade': 'A', 'name': 'Zelda'}
]
I want this result
{
'A': [
{'grade': 'A', 'name': 'James'},
{'grade': 'A', 'name': 'Zelda'}
],
'B': [ {'grade': 'B', 'name': 'Tom'} ]
}
Using dict.setdefault
we can do this:
import json
gradeList = [
{"grade": 'A', "name": 'James'},
{"grade": 'B', "name": 'Tom'},
{"grade": 'A', "name": 'Zelda'}
]
gradeDict = {}
for d in gradeList:
gradeDict.setdefault(d["grade"], []).append(d)
print(json.dumps(gradeDict, indent=4))
Output:
{
"A": [
{
"grade": "A",
"name": "James"
},
{
"grade": "A",
"name": "Zelda"
}
],
"B": [
{
"grade": "B",
"name": "Tom"
}
]
}
Use a dict and setdefault
:
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
arr2 = {}
for d in arr:
t = arr2.setdefault(d['grade'], [])
t.append(d)
>>> arr2
{'A': [{'grade': 'A', 'name': 'James'}, {'grade': 'A', 'name': 'Zelda'}],
'B': [{'grade': 'B', 'name': 'Tom'}]}
I would use a pd.Dataframe and do it like this:
import pandas as pd
df = pd.Dataframe(arr)
for index, group in df.groupby('grade'):
print(group)
Instead of print(group) you can write the data to whatever you need it, I suppose it is not necessarily a dict like you described.
I would do a simple loop like this:
arr = [{'grade': 'A', 'name': 'James'}, {'grade': 'B', 'name': 'Tom'}, {'grade': 'A', 'name': 'Zelda'}]
grouped_grades = {}
for item in arr:
if item['grade'] not in grouped_grades:
grouped_grades[item['grade']] = []
grouped_grades[item['grade']].append(item)
print(grouped_grades)
Output:
{'A': [{'grade': 'A', 'name': 'James'}, {'grade': 'A', 'name': 'Zelda'}], 'B': [{'grade': 'B', 'name': 'Tom'}]}
I think that the easiest way is to use defaultdict. Then you could convert the result back into an ordinary dict if you need to by simply passing it in the constructor like dict(output)
.
from collections import defaultdict
output = defaultdict(lambda: [])
for item in arr:
output[item['grade']].append(item)
You can use itertools.groupby
>>> keyfunc = lambda item: item['grade']
>>> {k:list(v) for k,v in itertools.groupby( sorted(arr,key=keyfunc) , keyfunc) }
{'A': [{'grade': 'A', 'name': 'James'}, {'grade': 'A', 'name': 'Zelda'}], 'B': [{'grade': 'B', 'name': 'Tom'}]}