In pandas find row per group which is smallest value greater than value

Question:

I have a dataframe which looks like this:

pd.DataFrame(
    {
        'A':
            [
                'C1', 'C1', 'C1', 'C1',
                'C2', 'C2', 'C2', 'C2',
                'C3', 'C3', 'C3', 'C3'
            ],
        'B':
            [
                1, 4, 8, 9, 1, 3, 8, 9, 1, 4, 7, 0
            ]
    }
)


Out[40]: 
     A  B
0   C1  1
1   C1  4
2   C1  8
3   C1  9
4   C2  1
5   C2  3
6   C2  8
7   C2  9
8   C3  1
9   C3  4
10  C3  7
11  C3  0

for each group in A, I want to find the row with the smallest value greater than 5

My resulting dataframe should look like this:

     A  B
2   C1  8
6   C2  8
10  C3  7

I have tried this but this does not give me the whole row

df[df.B >= 4].groupby('A')['B'].min()

What do I need to change?

Asked By: idt_tt

||

Answers:

Use idxmin instead of min to extract the index, then use loc:

df.loc[df[df.B > 5].groupby('A')['B'].idxmin()]

Output:

     A  B
2   C1  8
6   C2  8
10  C3  7

Alternatively, you can use sort_values followed by drop_duplicates:

df[df.B > 5].sort_values('B').drop_duplicates('A')

Output:

     A  B
10  C3  7
2   C1  8
6   C2  8
Answered By: Quang Hoang

Another way: Filter B greater than five. Groupby A and find B‘s min value in each group.

 df[df.B.gt(5)].groupby('A')['B'].min().reset_index()



  A  B
0  C1  8
1  C2  8
2  C3  7
Answered By: wwnde
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.