Exclude groups containing just one row instance
Question:
I have a pandas table like below with groupby
applied to get Groups 0, 1 and 2 as follows:
Group 0
Group 1
Group 2
Count
A
X1
577.5000
6
894.8700
2
X2
2697.3100
48
2697.3100
1
B3
2697.3100
30
B
RO2
92.46
3
C
C12
34.2700
9
39.2700
3
D
X55
4.1251
9
How do I filter out rows having just one group variation i.e. Below is how the final DF should look like:
Group 0
Group 1
Group 2
Count
A
X1
577.5000
6
894.8700
2
X2
2697.3100
48
2697.3100
1
B3
2697.3100
30
C
C12
34.2700
9
39.2700
3
Answers:
You can keep groups where at least one value from Group 0
is duplicated:
out = df.loc[df.index.get_level_values('Group 0').duplicated(keep=False)]
print(out)
# Output
Count
Group 0 Group 1 Group 2
A X1 577.50 6
894.87 2
X2 2697.31 48
2697.31 1
B3 2697.31 30
C C12 34.27 9
39.27 3
Use groupby.size
and boolean indexing:
Only considering the first group:
out = df[df.groupby(['Group 0'])['Group 0'].size().gt(1)]
Considering all groups:
out = df[df.groupby(['Group 0', 'Group 1', 'Group 2'])['Group 0'].size().gt(1)]
I have a pandas table like below with groupby
applied to get Groups 0, 1 and 2 as follows:
Group 0 | Group 1 | Group 2 | Count |
---|---|---|---|
A | X1 | 577.5000 | 6 |
894.8700 | 2 | ||
X2 | 2697.3100 | 48 | |
2697.3100 | 1 | ||
B3 | 2697.3100 | 30 | |
B | RO2 | 92.46 | 3 |
C | C12 | 34.2700 | 9 |
39.2700 | 3 | ||
D | X55 | 4.1251 | 9 |
How do I filter out rows having just one group variation i.e. Below is how the final DF should look like:
Group 0 | Group 1 | Group 2 | Count |
---|---|---|---|
A | X1 | 577.5000 | 6 |
894.8700 | 2 | ||
X2 | 2697.3100 | 48 | |
2697.3100 | 1 | ||
B3 | 2697.3100 | 30 | |
C | C12 | 34.2700 | 9 |
39.2700 | 3 |
You can keep groups where at least one value from Group 0
is duplicated:
out = df.loc[df.index.get_level_values('Group 0').duplicated(keep=False)]
print(out)
# Output
Count
Group 0 Group 1 Group 2
A X1 577.50 6
894.87 2
X2 2697.31 48
2697.31 1
B3 2697.31 30
C C12 34.27 9
39.27 3
Use groupby.size
and boolean indexing:
Only considering the first group:
out = df[df.groupby(['Group 0'])['Group 0'].size().gt(1)]
Considering all groups:
out = df[df.groupby(['Group 0', 'Group 1', 'Group 2'])['Group 0'].size().gt(1)]