Pandas groupby.agg with condition

Question

I have a pandas data frame similar to this:

name	sales	profit	profit_flag
Joe	200	100	True
Joe	300	150	False
Mark	200	100	True
Mark	300	150	True
Judy	300	150	False

The actual data frame has 100 columns.

The idea is: I want to group by name, and aggregate all the columns. However, certain columns depend on a flag. In this case, sales will be aggregated no matter what, but profit should be included in the aggregation only if profit_flag is True.

It should look like this if we use sum:

name	sales	profit
Joe	500	100
Judy	300	Nan
Mark	500	250

Is there anyway I can do this from one line using df.groupby('name').agg()?

Right now I’m using:

grouped = pd.DataFrame()
grouped['sales'] = df.groupby('name').sales.sum()
grouped['profit'] = df[df.profit_flag].groupby('name').profit.sum()

I’m getting the correct result, but since the actual data frame has many more columns, I wanted to know if I could somehow write something like this to avoid the clutter:

grouped = df.groupby('name').agg({
          'sales': 'sum',
          'profit' 'sum' #if profit_flag })

Is this even possible or should I just group ‘flag dependent columns’ in separate statemetns?

Asked By: Samba

||

Source

Answer 1

You can mask the values prior to aggregation:

(df.assign(profit=lambda d: d['profit'].where(d['profit_flag']))
   .groupby('name', as_index=False)[['sales', 'profit']].sum(min_count=1)
)

Output:

   name  sales  profit
0   Joe    500   100.0
1  Judy    300     NaN
2  Mark    500   250.0

Answered By: mozway

Pandas groupby.agg with condition

Question:

Answers: