Delete all rows before the first appearance of a condition in a pandas data frame

Question

I have the following data frame:

df = pd.DataFrame({"Person":[1,1,2,2,3,3,3,3],
                   "Bank":["B1","B2","B9","B2","B6","B1","B1","B5",]})

   Person Bank
0       1   B1
1       1   B2
2       2   B9
3       2   B2
4       3   B6
5       3   B1
6       3   B1
7       3   B5

I want to drop all the rows of each person that go before the first time B1 appears. That is, I want to keep the rows where Bank == B1 and the following ones.

This is what I want to get:

   Person Bank
0       1   B1
1       1   B2
5       3   B1
6       3   B1
7       3   B5

If B1 never happens, then clear all the rows that belong to that person. If there’s rows before the first appearance of B1, I want to drop them.

Asked By: Arturo Sbr

||

Source

Answer 1

You can check with transform

s=(df['Bank']=='B1').groupby(df['Person'])

df[(df.index>=(s.transform('idxmax')))&s.transform('any')]
Out[305]: 
   Person Bank
0       1   B1
1       1   B2
5       3   B1
6       3   B1
7       3   B5

Answered By: BENY

Answer 2

Using mask + ffill

m = df['Bank'].where(df['Bank'] == 'B1').groupby(df['Person']).ffill()

df[m.notnull()]

   Person Bank
0       1   B1
1       1   B2
5       3   B1
6       3   B1
7       3   B5

This works by making everything after the first occurrence in a group a non-null value. This is done in two steps:

1) Mask everything that isn’t valid.

df['Bank'].where(df['Bank'] == 'B1')

0     B1
1    NaN
2    NaN
3    NaN
4    NaN
5     B1
6     B1
7    NaN
Name: Bank, dtype: object

2) Fill forward per group. This is the real key to the answer. This means that all values after the first occurrence in B1 will be filled with valid strings (per group), so they won’t be removed by notnull

>>> m
0     B1
1     B1
2    NaN
3    NaN
4    NaN
5     B1
6     B1
7     B1
Name: Bank, dtype: object

Once we have the valid mask, it’s trivial to filter the DataFrame where the mask is not null.

Answered By: user3483203

Answer 3

Using cumsum and their bool correspondents (astype(bool))

df[df.groupby('Person').Bank.transform(lambda s: s.eq('B1').cumsum().astype(bool))]

   Person Bank
0       1   B1
1       1   B2
5       3   B1
6       3   B1
7       3   B5

Answered By: rafaelc

Delete all rows before the first appearance of a condition in a pandas data frame

Question:

Answers: