How to filter rows based on the value in two columns and a different value in a third
Question:
The following code generates a dataframe to illustrate my question
import pandas as pd
data = [[1152, '1', '10'], [1154, '1', '4'],
[1152, '1', '10'], [1155, '2', '10'],
[1152, '1', '4'], [1155, '2', '10']]
df = pd.DataFrame(data, columns =['Cow', 'Lact', 'Procedure'])
This generates the following
Cow Lact Procedure
0 1152 1 10
1 1154 1 4
2 1152 1 10
3 1155 2 10
4 1152 1 4
5 1155 2 10
I want to identify the rows where Cow and Lact are the same but procedure is different. The output I am looking for is
Cow Lact Procedure
0 1152 1 10
1 1152 1 10
2 1152 1 4
I figure it will require a groupby and filter function but not sure how to put it together.
Thanks
Answers:
Use groupby.transform('nunique')
and boolean indexing:
df[df.groupby(['Cow', 'Lact'])['Procedure'].transform('nunique').gt(1)]
Output:
Cow Lact Procedure
0 1152 1 10
2 1152 1 10
4 1152 1 4
The following code generates a dataframe to illustrate my question
import pandas as pd
data = [[1152, '1', '10'], [1154, '1', '4'],
[1152, '1', '10'], [1155, '2', '10'],
[1152, '1', '4'], [1155, '2', '10']]
df = pd.DataFrame(data, columns =['Cow', 'Lact', 'Procedure'])
This generates the following
Cow Lact Procedure
0 1152 1 10
1 1154 1 4
2 1152 1 10
3 1155 2 10
4 1152 1 4
5 1155 2 10
I want to identify the rows where Cow and Lact are the same but procedure is different. The output I am looking for is
Cow Lact Procedure
0 1152 1 10
1 1152 1 10
2 1152 1 4
I figure it will require a groupby and filter function but not sure how to put it together.
Thanks
Use groupby.transform('nunique')
and boolean indexing:
df[df.groupby(['Cow', 'Lact'])['Procedure'].transform('nunique').gt(1)]
Output:
Cow Lact Procedure
0 1152 1 10
2 1152 1 10
4 1152 1 4