Isolate rows containing IDs in a column based on another column value, yet keeping all the records of original ID

Question:

I’d prefer to explain it grafically as it’s hard for me to sum it up in the title.

Given a dataframe like this one below:

id        type
1         new
2         new
2         new repeater
2         repeater
3         repeater
4         new
4         new repeater
5         new repeater
5         repeater
6         new

I would like to filter it so it just returns me the values in the column id that appear in type at least as new, yet once this condition is fulfilled I want the remaining records belonging to this ID to stay in the outcoming DF. In other words, it should look like follows:

id        type
1         new
2         new
2         new repeater
2         repeater
4         new
4         new repeater
6         new
Asked By: teogj

||

Answers:

Use GroupBy.cummax with bollean mask for test first match condition and filter in boolean indexing:

df = df[df['type'].eq('new').groupby(df['id']).cummax()]
print (df)
   id          type
0   1           new
1   2           new
2   2  new repeater
3   2      repeater
5   4           new
6   4  new repeater
9   6           new
Answered By: jezrael
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.