How to groupby two columns of a dataframe and convert other columns into dict with column header as key
Question:
Dataframe:
id id_2 salary title allowance name
0420 13.28 100000 director No Tom
0420 13.28 70000 developer Yes Sam
0110 13.12 120000 director No Dave
0110 13.12 75000 developer Yes shaun
Groupby id and id_2 and convert the rest of columns to dict with column header
I wrote a loop for this, I think don’t think it is python way, Please let me know how to do it with pandas.
Required output:
[{
"id": 420,
"id_2": 13.28,
"attributes":[
{ "salary": 100000,
"title":"director",
"allowance":"No",
"name": "Tom"
},
{ "salary": 70000,
"title": "developer",
"allowance":"Yes",
"name": "Sam"
}
]
},
{
"id": 110,
"id_2": 13.12,
"attributes":[
{ "salary": 120000,
"title":"director",
"allowance":"No",
"name": "Dave"
},
{ "salary": 75000,
"title": "developer",
"allowance":"Yes",
"name": "shaun"
}
]
}
]
Answers:
- There is not a one-liner pandas argument that will provide a
list
of dicts
in the shape you’re requesting.
- Use
.groupby
to select the groups
g
is the tuple
that represents the values used to groupby
d
is the dataframe for the groupby value, g
- Use
.iterrows
to iterate through the rows of each group
- Returns
index
represented by the first _
, because it’s not needed
- Returns
data
, from which the labels in groupby_list
are dropped, then convert the remainder to a dict
using .to_dict()
, and append it to the list
, att_list
- After looping through all rows of the group, assign
att_list
as the value to group['attributes']
- Once each group has been iterated through, append the
dict
, group
, to dict_list
.
dict_list
can be converted back to a dataframe with the following:
df = pd.json_normalize(dict_list, 'attributes', meta=groupby_list)
dict_list = list()
groupby_list = ['id', 'id_2']
for g, d in df.groupby(groupby_list):
group = dict(zip(groupby_list, g))
att_list = list()
for _, data in d.iterrows():
data = data.drop(labels=groupby_list)
att_list.append(data.to_dict())
group['attributes'] = att_list
dict_list.append(group)
dict_list
:
[{'attributes': [{'allowance': 'No',
'name': 'Dave',
'salary': 120000,
'title': 'director'},
{'allowance': 'Yes',
'name': 'shaun',
'salary': 75000,
'title': 'developer'}],
'id': 110,
'id_2': 13.12},
{'attributes': [{'allowance': 'No',
'name': 'Tom',
'salary': 100000,
'title': 'director'},
{'allowance': 'Yes',
'name': 'Sam',
'salary': 70000,
'title': 'developer'}],
'id': 420,
'id_2': 13.28}]
df1.groupby(['id','id_2']).apply(lambda dd:dd.iloc[:,2:].to_dict('r'))
.to_frame("attributes").reset_index().to_dict('r')
out:
[{'id': 110,
'id_2': 13.12,
'attributes': [{'salary': 120000,
'title': 'director',
'allowance': 'No',
'name': 'Dave'},
{'salary': 75000,
'title': 'developer',
'allowance': 'Yes',
'name': 'shaun'}]},
{'id': 420,
'id_2': 13.28,
'attributes': [{'salary': 100000,
'title': 'director',
'allowance': 'No',
'name': 'Tom'},
{'salary': 70000,
'title': 'developer',
'allowance': 'Yes',
'name': 'Sam'}]}]
Dataframe:
id id_2 salary title allowance name
0420 13.28 100000 director No Tom
0420 13.28 70000 developer Yes Sam
0110 13.12 120000 director No Dave
0110 13.12 75000 developer Yes shaun
Groupby id and id_2 and convert the rest of columns to dict with column header
I wrote a loop for this, I think don’t think it is python way, Please let me know how to do it with pandas.
Required output:
[{
"id": 420,
"id_2": 13.28,
"attributes":[
{ "salary": 100000,
"title":"director",
"allowance":"No",
"name": "Tom"
},
{ "salary": 70000,
"title": "developer",
"allowance":"Yes",
"name": "Sam"
}
]
},
{
"id": 110,
"id_2": 13.12,
"attributes":[
{ "salary": 120000,
"title":"director",
"allowance":"No",
"name": "Dave"
},
{ "salary": 75000,
"title": "developer",
"allowance":"Yes",
"name": "shaun"
}
]
}
]
- There is not a one-liner pandas argument that will provide a
list
ofdicts
in the shape you’re requesting. - Use
.groupby
to select the groupsg
is thetuple
that represents the values used to groupbyd
is the dataframe for the groupby value,g
- Use
.iterrows
to iterate through the rows of each group- Returns
index
represented by the first_
, because it’s not needed - Returns
data
, from which the labels ingroupby_list
are dropped, then convert the remainder to adict
using.to_dict()
, and append it to thelist
,att_list
- After looping through all rows of the group, assign
att_list
as the value togroup['attributes']
- Returns
- Once each group has been iterated through, append the
dict
,group
, todict_list
. dict_list
can be converted back to a dataframe with the following:df = pd.json_normalize(dict_list, 'attributes', meta=groupby_list)
dict_list = list()
groupby_list = ['id', 'id_2']
for g, d in df.groupby(groupby_list):
group = dict(zip(groupby_list, g))
att_list = list()
for _, data in d.iterrows():
data = data.drop(labels=groupby_list)
att_list.append(data.to_dict())
group['attributes'] = att_list
dict_list.append(group)
dict_list
:
[{'attributes': [{'allowance': 'No',
'name': 'Dave',
'salary': 120000,
'title': 'director'},
{'allowance': 'Yes',
'name': 'shaun',
'salary': 75000,
'title': 'developer'}],
'id': 110,
'id_2': 13.12},
{'attributes': [{'allowance': 'No',
'name': 'Tom',
'salary': 100000,
'title': 'director'},
{'allowance': 'Yes',
'name': 'Sam',
'salary': 70000,
'title': 'developer'}],
'id': 420,
'id_2': 13.28}]
df1.groupby(['id','id_2']).apply(lambda dd:dd.iloc[:,2:].to_dict('r'))
.to_frame("attributes").reset_index().to_dict('r')
out:
[{'id': 110,
'id_2': 13.12,
'attributes': [{'salary': 120000,
'title': 'director',
'allowance': 'No',
'name': 'Dave'},
{'salary': 75000,
'title': 'developer',
'allowance': 'Yes',
'name': 'shaun'}]},
{'id': 420,
'id_2': 13.28,
'attributes': [{'salary': 100000,
'title': 'director',
'allowance': 'No',
'name': 'Tom'},
{'salary': 70000,
'title': 'developer',
'allowance': 'Yes',
'name': 'Sam'}]}]