In pandas find row per group which is smallest value greater than value
Question:
I have a dataframe which looks like this:
pd.DataFrame(
{
'A':
[
'C1', 'C1', 'C1', 'C1',
'C2', 'C2', 'C2', 'C2',
'C3', 'C3', 'C3', 'C3'
],
'B':
[
1, 4, 8, 9, 1, 3, 8, 9, 1, 4, 7, 0
]
}
)
Out[40]:
A B
0 C1 1
1 C1 4
2 C1 8
3 C1 9
4 C2 1
5 C2 3
6 C2 8
7 C2 9
8 C3 1
9 C3 4
10 C3 7
11 C3 0
for each group in A, I want to find the row with the smallest value greater than 5
My resulting dataframe should look like this:
A B
2 C1 8
6 C2 8
10 C3 7
I have tried this but this does not give me the whole row
df[df.B >= 4].groupby('A')['B'].min()
What do I need to change?
Answers:
Use idxmin
instead of min
to extract the index, then use loc
:
df.loc[df[df.B > 5].groupby('A')['B'].idxmin()]
Output:
A B
2 C1 8
6 C2 8
10 C3 7
Alternatively, you can use sort_values
followed by drop_duplicates
:
df[df.B > 5].sort_values('B').drop_duplicates('A')
Output:
A B
10 C3 7
2 C1 8
6 C2 8
Another way: Filter B
greater than five. Groupby
A
and find B
‘s min
value in each group.
df[df.B.gt(5)].groupby('A')['B'].min().reset_index()
A B
0 C1 8
1 C2 8
2 C3 7
I have a dataframe which looks like this:
pd.DataFrame(
{
'A':
[
'C1', 'C1', 'C1', 'C1',
'C2', 'C2', 'C2', 'C2',
'C3', 'C3', 'C3', 'C3'
],
'B':
[
1, 4, 8, 9, 1, 3, 8, 9, 1, 4, 7, 0
]
}
)
Out[40]:
A B
0 C1 1
1 C1 4
2 C1 8
3 C1 9
4 C2 1
5 C2 3
6 C2 8
7 C2 9
8 C3 1
9 C3 4
10 C3 7
11 C3 0
for each group in A, I want to find the row with the smallest value greater than 5
My resulting dataframe should look like this:
A B
2 C1 8
6 C2 8
10 C3 7
I have tried this but this does not give me the whole row
df[df.B >= 4].groupby('A')['B'].min()
What do I need to change?
Use idxmin
instead of min
to extract the index, then use loc
:
df.loc[df[df.B > 5].groupby('A')['B'].idxmin()]
Output:
A B
2 C1 8
6 C2 8
10 C3 7
Alternatively, you can use sort_values
followed by drop_duplicates
:
df[df.B > 5].sort_values('B').drop_duplicates('A')
Output:
A B
10 C3 7
2 C1 8
6 C2 8
Another way: Filter B
greater than five. Groupby
A
and find B
‘s min
value in each group.
df[df.B.gt(5)].groupby('A')['B'].min().reset_index()
A B
0 C1 8
1 C2 8
2 C3 7