How to convert a set of pandas dataframe columns to a list of dictionaries and save it into a new column?
Question:
I have a dataframe
:
col1 col2 col3
0 1 a W
1 1 b X
2 2 c Y
3 2 d Z
I need to convert it into something like this, combining based on column 1 values:
col1 col2 col3 dict_col
0 1 a W [{'col2': 'a', 'col3': 'W'}, {'col2': 'b', 'col3': 'X'}]
0 2 c Y [{'col2': 'c', 'col3': 'Y'}, {'col2': 'd', 'col3': 'Z'}]
This is what i tried doing:
import pandas as pd
data = {
'col1': [1, 1, 2, 2],
'col2': ['a', 'b', 'c', 'd'],
'col3': ['W', 'X', 'Y', 'Z']}
df = pd.DataFrame(data)
print(df)
cols_to_use = ['col2','col3']
df['dict_col'] = df[cols_to_use].apply(
lambda x: {'col2': x['col2'], 'col3': x['col3']},
axis=1)
print(df)
Answers:
You can use groupby and then apply your function:
def group_rows(sub_df):
dict_col = []
for i, row in sub_df.iterrows():
dict_col.append(dict(row))
return pd.Series({
"col2": dict_col[0]["col2"],
"col3": dict_col[0]["col3"],
"dict_col": dict_col
})
df.groupby("col1").apply(group_rows)
this will give you
col2 col3 dict_col
col1
1 a W [{'col1': 1, 'col2': 'a', 'col3': 'W'}, {'col1': 1, 'col2': 'b', 'col3': 'X'}]
2 c Y [{'col1': 2, 'col2': 'c', 'col3': 'Y'}, {'col1': 2, 'col2': 'd', 'col3': 'Z'}]
You can use:
g = df.groupby('col1')
df1 = g.apply(lambda x: x.drop(columns='col1').to_dict('records')).rename('dict_col')
out = pd.concat([g.first(), df1], axis=1).reset_index()
You can also do as alternative:
g = df.drop(columns='col1').groupby(df['col1'])
df1 = g.apply(lambda x: x.to_dict('records')).rename('dict_col')
out = pd.concat([g.first(), df1], axis=1).reset_index()
Output:
>>> out
col1 col2 col3 dict_col
0 1 a W [{'col2': 'a', 'col3': 'W'}, {'col2': 'b', 'co...
1 2 c Y [{'col2': 'c', 'col3': 'Y'}, {'col2': 'd', 'co...
I have a dataframe
:
col1 col2 col3
0 1 a W
1 1 b X
2 2 c Y
3 2 d Z
I need to convert it into something like this, combining based on column 1 values:
col1 col2 col3 dict_col
0 1 a W [{'col2': 'a', 'col3': 'W'}, {'col2': 'b', 'col3': 'X'}]
0 2 c Y [{'col2': 'c', 'col3': 'Y'}, {'col2': 'd', 'col3': 'Z'}]
This is what i tried doing:
import pandas as pd
data = {
'col1': [1, 1, 2, 2],
'col2': ['a', 'b', 'c', 'd'],
'col3': ['W', 'X', 'Y', 'Z']}
df = pd.DataFrame(data)
print(df)
cols_to_use = ['col2','col3']
df['dict_col'] = df[cols_to_use].apply(
lambda x: {'col2': x['col2'], 'col3': x['col3']},
axis=1)
print(df)
You can use groupby and then apply your function:
def group_rows(sub_df):
dict_col = []
for i, row in sub_df.iterrows():
dict_col.append(dict(row))
return pd.Series({
"col2": dict_col[0]["col2"],
"col3": dict_col[0]["col3"],
"dict_col": dict_col
})
df.groupby("col1").apply(group_rows)
this will give you
col2 col3 dict_col
col1
1 a W [{'col1': 1, 'col2': 'a', 'col3': 'W'}, {'col1': 1, 'col2': 'b', 'col3': 'X'}]
2 c Y [{'col1': 2, 'col2': 'c', 'col3': 'Y'}, {'col1': 2, 'col2': 'd', 'col3': 'Z'}]
You can use:
g = df.groupby('col1')
df1 = g.apply(lambda x: x.drop(columns='col1').to_dict('records')).rename('dict_col')
out = pd.concat([g.first(), df1], axis=1).reset_index()
You can also do as alternative:
g = df.drop(columns='col1').groupby(df['col1'])
df1 = g.apply(lambda x: x.to_dict('records')).rename('dict_col')
out = pd.concat([g.first(), df1], axis=1).reset_index()
Output:
>>> out
col1 col2 col3 dict_col
0 1 a W [{'col2': 'a', 'col3': 'W'}, {'col2': 'b', 'co...
1 2 c Y [{'col2': 'c', 'col3': 'Y'}, {'col2': 'd', 'co...