How to convert max values of a multiple groupby dataframe to nan?

Question

I have this df:

          CODE  MONTH  PP
0         100007  01  22.1
1         100007  01  20
2         100007  01  5
3         100007  01  10
4         100007  01  12
         ...  ..  ..
10542747  155217  02  11
10542748  155217  02  12
10542749  155217  02  15
10542750  155217  02  18
10542751  155217  02  3

[10542752 rows x 3 columns]

I want to first group the df by df['CODE'] and df['MONTH']. And then convert the max value of the grouped df ‘PP’ column to nan.

So i did this code:

grouped_df=pd.DataFrame()
for i, data in df.groupby(['CODE','MONTH']):

    data.loc[data['PP']==data['PP'].max(), 'PP']=np.nan
    grouped_df=grouped_df.append(data)

But it takes too long to run. Like 15 minutes. Maybe cause i have [10542752 rows x 3 columns] in the df. But is there any way to improve this code to a faster one?

Thanks in advance

Asked By: Javier

||

Source

Answer 1

No need for the loop, directly perform boolean indexing using groupby.transform('max') as reference:

m = data.groupby(['CODE','MONTH'])['PP'].transform('max')

data.loc[data['PP'].eq(m), 'PP'] = np.nan

Answered By: mozway

Answer 2

using mask


df['PP']=df['PP'].mask(df.groupby(['CODE','MONTH'])['PP'].transform(max).eq(df['PP'], np.nan) )
df

    CODE    MONTH   PP
0   100007  1   NaN
1   100007  1   20.0
2   100007  1   5.0
3   100007  1   10.0
4   100007  1   12.0
10542747    155217  2   11.0
10542748    155217  2   12.0
10542749    155217  2   15.0
10542750    155217  2   NaN
10542751    155217  2   3.0

Answered By: Naveed

How to convert max values of a multiple groupby dataframe to nan?

Question:

Answers: