Isolate rows containing IDs in a column based on another column value, yet keeping all the records of original ID
Question:
I’d prefer to explain it grafically as it’s hard for me to sum it up in the title.
Given a dataframe like this one below:
id type
1 new
2 new
2 new repeater
2 repeater
3 repeater
4 new
4 new repeater
5 new repeater
5 repeater
6 new
I would like to filter it so it just returns me the values in the column id
that appear in type
at least as new
, yet once this condition is fulfilled I want the remaining records belonging to this ID to stay in the outcoming DF. In other words, it should look like follows:
id type
1 new
2 new
2 new repeater
2 repeater
4 new
4 new repeater
6 new
Answers:
Use GroupBy.cummax
with bollean mask for test first match condition and filter in boolean indexing
:
df = df[df['type'].eq('new').groupby(df['id']).cummax()]
print (df)
id type
0 1 new
1 2 new
2 2 new repeater
3 2 repeater
5 4 new
6 4 new repeater
9 6 new
I’d prefer to explain it grafically as it’s hard for me to sum it up in the title.
Given a dataframe like this one below:
id type
1 new
2 new
2 new repeater
2 repeater
3 repeater
4 new
4 new repeater
5 new repeater
5 repeater
6 new
I would like to filter it so it just returns me the values in the column id
that appear in type
at least as new
, yet once this condition is fulfilled I want the remaining records belonging to this ID to stay in the outcoming DF. In other words, it should look like follows:
id type
1 new
2 new
2 new repeater
2 repeater
4 new
4 new repeater
6 new
Use GroupBy.cummax
with bollean mask for test first match condition and filter in boolean indexing
:
df = df[df['type'].eq('new').groupby(df['id']).cummax()]
print (df)
id type
0 1 new
1 2 new
2 2 new repeater
3 2 repeater
5 4 new
6 4 new repeater
9 6 new