Pandas groupby filter only last two rows

Question:

I am working on pandas manipulation and want to select only the last two rows for each column "B".

How to do without reset_index and filter (do inside groupby)

import pandas as pd
df = pd.DataFrame({
    'A': list('aaabbbbcccc'),
    'B': [0,1,2,5,7,2,1,4,1,0,2],
    'V': range(10,120,10)
})

df

My attempt

df.groupby(['A','B'])['V'].sum()

Required output

A  B
a  
   1     20
   2     30
b  
   5     40
   7     50
c  
   2    110
   4     80
Asked By: user15929966

||

Answers:

Try:

df.sort_values(['A', 'B']).groupby(['A']).tail(2)

Output:

    A  B    V
1   a  1   20
2   a  2   30
3   b  5   40
4   b  7   50
10  c  2  110
7   c  4   80
Answered By: Scott Boston

IIUC, you want to get the rows the highest two B per A.

You can compute a descending rank per group and keep those ≤ 2.

df[df.groupby('A')['B'].rank('first', ascending=False).le(2)]

Output:

    A  B    V
1   a  1   20
2   a  2   30
3   b  5   40
4   b  7   50
7   c  4   80
10  c  2  110
Answered By: mozway
def function1(dd:pd.DataFrame):
    return dd.sort_values('B').iloc[-2:,1:]

df.groupby(['A']).apply(function1).droplevel(1)

out

  B    V
A        
a  1   20
a  2   30
b  5   40
b  7   50
c  2  110
c  4   80
Answered By: G.G
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.