How to apply different functions for different columns in pandas dataframe?

Question:

I have a table with shipments:

booking_order driver_name tonnage stop_name parcels kgs
2794 John 3 Warsaw 200 180
2794 John 3 Radom 300 270
2794 John 3 Krakow 150 135
3005 Mark 5 Gdansk 500 450
3005 Frank 5 Gdynia 400 360
3005 Frank 5 Sopot 123 10.7

Task is to group all rows by booking order and to show for driver name, tonnage and stop_name as unique values inside (set with different delimiters) and sum for parcels and kgs.

Needed result is below:

booking_order driver_name tonnage stop_name parcels kgs
2794 John 3 Warsaw>Radom>Krakow 650 585
3005 Mark, Frank 5 Gdansk>Gdynia>Sopot 1023 920.7

I only could do grouping, but don’t know how to apply different methods to different columns correctly

import pandas as pd
excel=pd.read_excel('source.xlsx')
result=excel.groupby('booking_order').agg(lambda x: list(x)).reset_index()
result.to_excel('result1.xlsx')
Asked By: bluekit46

||

Answers:

Use groupby and agg with a dict of aggregation functions:

aggfunc = {'driver_name': lambda x: ','.join(x.unique()),
           'tonnage': 'first',
           'stop_name': '>'.join,
           'parcels': 'sum',
           'kgs': 'sum'}
result = excel.groupby('booking_order', as_index=False).agg(aggfunc)

Output:

>>> result
   booking_order driver_name  tonnage            stop_name  parcels    kgs
0           2794        John        3  Warsaw>Radom>Krakow      650  585.0
1           3005  Mark,Frank        5  Gdansk>Gdynia>Sopot     1023  820.7
Answered By: Corralien

Use:

result=(excel.groupby(['booking_order'], as_index=False)
             .agg(driver_name=('driver_name',lambda x: ','.join(x.unique())),
                  tonnage=('tonnage','first'),
                  stop_name=('stop_name','>'.join),
                  parcels=('parcels','sum'),
                  kgs=('kgs','sum')))
print (result)
   booking_order driver_name  tonnage            stop_name  parcels    kgs
0           2794        John        3  Warsaw>Radom>Krakow      650  585.0
1           3005  Mark,Frank        5  Gdansk>Gdynia>Sopot     1023  820.7
Answered By: jezrael

You can do your aggregations separately.

import pandas as pd

df = pd.DataFrame({"A": ["a", "b", "a", "b"], "B": [1,2,3,4], "C": [5,6,7,8]})

group_obj = df.groupby("A")

grouper_b = group_obj.B.mean()
grouper_c = group_obj.C.max()

result = pd.concat([grouper_b, grouper_a], axis=1)

Or pass the options in func to agg. See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.agg.html

df.groupby("A").agg({"B": min, "C": max})
Answered By: Ken Jiiii
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.