Isolate rows containing IDs in a column based on another column value, yet keeping all the records of original ID

Question

I’d prefer to explain it grafically as it’s hard for me to sum it up in the title.

Given a dataframe like this one below:

id        type
1         new
2         new
2         new repeater
2         repeater
3         repeater
4         new
4         new repeater
5         new repeater
5         repeater
6         new

I would like to filter it so it just returns me the values in the column id that appear in type at least as new, yet once this condition is fulfilled I want the remaining records belonging to this ID to stay in the outcoming DF. In other words, it should look like follows:

id        type
1         new
2         new
2         new repeater
2         repeater
4         new
4         new repeater
6         new

Asked By: teogj

||

Source

Answer 1

Use GroupBy.cummax with bollean mask for test first match condition and filter in boolean indexing:

df = df[df['type'].eq('new').groupby(df['id']).cummax()]
print (df)
   id          type
0   1           new
1   2           new
2   2  new repeater
3   2      repeater
5   4           new
6   4  new repeater
9   6           new

Answered By: jezrael

Isolate rows containing IDs in a column based on another column value, yet keeping all the records of original ID

Question:

Answers: