How to apply different functions for different columns in pandas dataframe?
Question:
I have a table with shipments:
booking_order
driver_name
tonnage
stop_name
parcels
kgs
2794
John
3
Warsaw
200
180
2794
John
3
Radom
300
270
2794
John
3
Krakow
150
135
3005
Mark
5
Gdansk
500
450
3005
Frank
5
Gdynia
400
360
3005
Frank
5
Sopot
123
10.7
Task is to group all rows by booking order and to show for driver name, tonnage and stop_name as unique values inside (set with different delimiters) and sum for parcels and kgs.
Needed result is below:
booking_order
driver_name
tonnage
stop_name
parcels
kgs
2794
John
3
Warsaw>Radom>Krakow
650
585
3005
Mark, Frank
5
Gdansk>Gdynia>Sopot
1023
920.7
I only could do grouping, but don’t know how to apply different methods to different columns correctly
import pandas as pd
excel=pd.read_excel('source.xlsx')
result=excel.groupby('booking_order').agg(lambda x: list(x)).reset_index()
result.to_excel('result1.xlsx')
Answers:
Use groupby
and agg
with a dict of aggregation functions:
aggfunc = {'driver_name': lambda x: ','.join(x.unique()),
'tonnage': 'first',
'stop_name': '>'.join,
'parcels': 'sum',
'kgs': 'sum'}
result = excel.groupby('booking_order', as_index=False).agg(aggfunc)
Output:
>>> result
booking_order driver_name tonnage stop_name parcels kgs
0 2794 John 3 Warsaw>Radom>Krakow 650 585.0
1 3005 Mark,Frank 5 Gdansk>Gdynia>Sopot 1023 820.7
Use:
result=(excel.groupby(['booking_order'], as_index=False)
.agg(driver_name=('driver_name',lambda x: ','.join(x.unique())),
tonnage=('tonnage','first'),
stop_name=('stop_name','>'.join),
parcels=('parcels','sum'),
kgs=('kgs','sum')))
print (result)
booking_order driver_name tonnage stop_name parcels kgs
0 2794 John 3 Warsaw>Radom>Krakow 650 585.0
1 3005 Mark,Frank 5 Gdansk>Gdynia>Sopot 1023 820.7
You can do your aggregations separately.
import pandas as pd
df = pd.DataFrame({"A": ["a", "b", "a", "b"], "B": [1,2,3,4], "C": [5,6,7,8]})
group_obj = df.groupby("A")
grouper_b = group_obj.B.mean()
grouper_c = group_obj.C.max()
result = pd.concat([grouper_b, grouper_a], axis=1)
Or pass the options in func
to agg
. See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.agg.html
df.groupby("A").agg({"B": min, "C": max})
I have a table with shipments:
booking_order | driver_name | tonnage | stop_name | parcels | kgs |
---|---|---|---|---|---|
2794 | John | 3 | Warsaw | 200 | 180 |
2794 | John | 3 | Radom | 300 | 270 |
2794 | John | 3 | Krakow | 150 | 135 |
3005 | Mark | 5 | Gdansk | 500 | 450 |
3005 | Frank | 5 | Gdynia | 400 | 360 |
3005 | Frank | 5 | Sopot | 123 | 10.7 |
Task is to group all rows by booking order and to show for driver name, tonnage and stop_name as unique values inside (set with different delimiters) and sum for parcels and kgs.
Needed result is below:
booking_order | driver_name | tonnage | stop_name | parcels | kgs |
---|---|---|---|---|---|
2794 | John | 3 | Warsaw>Radom>Krakow | 650 | 585 |
3005 | Mark, Frank | 5 | Gdansk>Gdynia>Sopot | 1023 | 920.7 |
I only could do grouping, but don’t know how to apply different methods to different columns correctly
import pandas as pd
excel=pd.read_excel('source.xlsx')
result=excel.groupby('booking_order').agg(lambda x: list(x)).reset_index()
result.to_excel('result1.xlsx')
Use groupby
and agg
with a dict of aggregation functions:
aggfunc = {'driver_name': lambda x: ','.join(x.unique()),
'tonnage': 'first',
'stop_name': '>'.join,
'parcels': 'sum',
'kgs': 'sum'}
result = excel.groupby('booking_order', as_index=False).agg(aggfunc)
Output:
>>> result
booking_order driver_name tonnage stop_name parcels kgs
0 2794 John 3 Warsaw>Radom>Krakow 650 585.0
1 3005 Mark,Frank 5 Gdansk>Gdynia>Sopot 1023 820.7
Use:
result=(excel.groupby(['booking_order'], as_index=False)
.agg(driver_name=('driver_name',lambda x: ','.join(x.unique())),
tonnage=('tonnage','first'),
stop_name=('stop_name','>'.join),
parcels=('parcels','sum'),
kgs=('kgs','sum')))
print (result)
booking_order driver_name tonnage stop_name parcels kgs
0 2794 John 3 Warsaw>Radom>Krakow 650 585.0
1 3005 Mark,Frank 5 Gdansk>Gdynia>Sopot 1023 820.7
You can do your aggregations separately.
import pandas as pd
df = pd.DataFrame({"A": ["a", "b", "a", "b"], "B": [1,2,3,4], "C": [5,6,7,8]})
group_obj = df.groupby("A")
grouper_b = group_obj.B.mean()
grouper_c = group_obj.C.max()
result = pd.concat([grouper_b, grouper_a], axis=1)
Or pass the options in func
to agg
. See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.agg.html
df.groupby("A").agg({"B": min, "C": max})