apply where if only in rows where condition is met

Question

How do you apply a groupby in a dataframe only when a condition is met in a row and still keep all other rows that do not meet said condition?

For instance, let’s say I have a dataframe:

Number	Version	Binary	Random column
1	0	True	a
1	1	True	b
1	2	True	c
2	0	False	a
2	1	False	b

And, with this dataframe, I would like to groupby all of the numbers which binary value is True and keep only the one with the latest version (that is, the largest number), while still respecting and keeping the rows which do not meet the condition described.

That said, it should give a result like this:

Number	Version	Binary	Random column
1	2	True	c
2	0	False	a
2	1	False	b

Asked By: AvocadoToast

||

Source

Answer 1

Here is a way using pd.concat() and drop_duplicates()

(pd.concat([df.loc[df.loc[df['Binary']].groupby('Number')['Version'].idxmax()],
df.loc[~df['Binary']]],
ignore_index=True))

Output:

   Number  Version  Binary Random column
0       1        2    True             c
1       2        0   False             a
2       2        1   False             b

Answered By: rhug123

Answer 2

I would use boolean indexing here:

# is the row the max version of its group?
m1 = df['Version'].eq(df.groupby('Binary')['Version'].transform('max'))
# is the binary not True?
m2 = ~df['Binary']

# keep if either condition above is met
df[m1|m2]

Output:

   Number  Version  Binary Random column
2       1        2    True             c
3       2        0   False             a
4       2        1   False             b

alternative

Assuming you would like to perform an expensive operation, you could also apply only if the group matches a condition, else return the group unchanged:

(df.groupby('Binary', group_keys=False, sort=False)
   .apply(lambda g: g.loc[g['Version'].eq(g['Version'].max())]
          if g.name else g)
 )

Answered By: mozway

apply where if only in rows where condition is met

Question:

Answers:

alternative