Pandas Intersection after grouping based on common errors between each group
Question:
I have the following dataframe:
I want to find intersections based on ID
that consistently have errors in all the Run
.
So, all ID
s are repeating in all Run
s. I tried to group data by Run
first, then as per this similar question. I tried the code, but it doesn’t return an intersection.
filter=all_data_df.groupby('Run')['Error'].transform('nunique') == all_data_df['Error'].nunique()
df = all_data_df.loc[filter]
The results is same dataframe I started with.
How can this be fixed?
I am expecting to obtain
Where only ID 234534
consistently has errors.
Answers:
You can use boolean indexing to keep only the Id
s that have errors in all
runs :
# Below, I'm using `df` instead of `all_data_df`
consistent_errors = df["Error"].eq("Yes").groupby(df["Id"]).transform("all")
out = df.loc[consistent_errors]
I have the following dataframe:
I want to find intersections based on ID
that consistently have errors in all the Run
.
So, all ID
s are repeating in all Run
s. I tried to group data by Run
first, then as per this similar question. I tried the code, but it doesn’t return an intersection.
filter=all_data_df.groupby('Run')['Error'].transform('nunique') == all_data_df['Error'].nunique()
df = all_data_df.loc[filter]
The results is same dataframe I started with.
How can this be fixed?
I am expecting to obtain
Where only ID 234534
consistently has errors.
You can use boolean indexing to keep only the Id
s that have errors in all
runs :
# Below, I'm using `df` instead of `all_data_df`
consistent_errors = df["Error"].eq("Yes").groupby(df["Id"]).transform("all")
out = df.loc[consistent_errors]