pandas apply groupby and agg, and update orig columns
Question:
I have a dataframe df
:
Group1. Group2 Val
0 1 Q 2
1 1 Q 3
2 2 R 8
3 4 Y 9
I want to update df with list of values per group, so new df will be
Group Group2 Val new
0 1 Q 2 [2, 3]
1 1 Q 3 [2, 3]
2 2 R 8 [8]
3 4 Y 9 [9]
What is the best way to do so?
Answers:
You can’t use groupby.transform
directly, so groupby.agg
and map
(or merge
if you have several groupers):
df['new'] = df['Group'].map(df.groupby('Group')['Val'].agg(list))
Output:
Group Val new
0 1 2 [2, 3]
1 1 3 [2, 3]
2 2 8 [8]
3 4 9 [9]
Using several columns to group:
cols = ['Group1', 'Group2']
df['new'] = df.merge(df.groupby(cols)['Val'].agg(list),
left_on=cols, right_index=True, how='left')['Val_y']
Example:
Group1 Group2 Val new
0 1 Q 2 [2, 3]
1 1 Q 3 [2, 3] # used Q here as example
2 2 R 8 [8]
3 4 Y 9 [9]
If you do want to use transform
which I think may be a little bit slower than map
in other answer or may not be. Just in case you want to know how to do it:
df['new'] = df.assign(
Vals=df['Val'].values.reshape(-1, 1).tolist()
).groupby('Group1')['Vals'].transform(sum)
print(df)
Group1 Group2 Val new
0 1 Q 2 [2, 3]
1 1 T 3 [2, 3]
2 2 R 8 [8]
3 4 Y 9 [9]
The transient Vals
column looks like:
0 [2]
1 [3]
2 [8]
3 [9]
Name: Vals, dtype: object
I have a dataframe df
:
Group1. Group2 Val
0 1 Q 2
1 1 Q 3
2 2 R 8
3 4 Y 9
I want to update df with list of values per group, so new df will be
Group Group2 Val new
0 1 Q 2 [2, 3]
1 1 Q 3 [2, 3]
2 2 R 8 [8]
3 4 Y 9 [9]
What is the best way to do so?
You can’t use groupby.transform
directly, so groupby.agg
and map
(or merge
if you have several groupers):
df['new'] = df['Group'].map(df.groupby('Group')['Val'].agg(list))
Output:
Group Val new
0 1 2 [2, 3]
1 1 3 [2, 3]
2 2 8 [8]
3 4 9 [9]
Using several columns to group:
cols = ['Group1', 'Group2']
df['new'] = df.merge(df.groupby(cols)['Val'].agg(list),
left_on=cols, right_index=True, how='left')['Val_y']
Example:
Group1 Group2 Val new
0 1 Q 2 [2, 3]
1 1 Q 3 [2, 3] # used Q here as example
2 2 R 8 [8]
3 4 Y 9 [9]
If you do want to use transform
which I think may be a little bit slower than map
in other answer or may not be. Just in case you want to know how to do it:
df['new'] = df.assign(
Vals=df['Val'].values.reshape(-1, 1).tolist()
).groupby('Group1')['Vals'].transform(sum)
print(df)
Group1 Group2 Val new
0 1 Q 2 [2, 3]
1 1 T 3 [2, 3]
2 2 R 8 [8]
3 4 Y 9 [9]
The transient Vals
column looks like:
0 [2]
1 [3]
2 [8]
3 [9]
Name: Vals, dtype: object