Pandas: Select top n groups

Question:

Suppose I have a dataframe like

a   b
i1  t1
i1  t2
i2  t3
i2  t1
i2  t3
i3  t2

I want to group df by "a" and then select 2 top largest group. I specifically want the number of resulting rows

a   b
i2  t3
i2  t1
i2  t3
i1  t1
i1  t2

I tried:

df.groupby("a").head(2)   

But it seems select two rows of each group

Asked By: Ahmad

||

Answers:

Example

data = {'a': {0: 'i1', 1: 'i1', 2: 'i2', 3: 'i2', 4: 'i2', 5: 'i3'},
        'b': {0: 't1', 1: 't2', 2: 't3', 3: 't1', 4: 't3', 5: 't2'}}
df = pd.DataFrame(data)

Code

lst = df['a'].value_counts()[:2].index
out = df[df['a'].isin(lst)]

out

     a  b
0   i1  t1
1   i1  t2
2   i2  t3
3   i2  t1
4   i2  t3

if you want sort by quantity. use following code

lst = df['a'].value_counts()[:2].index
m = pd.Series(range(0, 2), index=lst)
out = df[df['a'].isin(lst)].sort_values('a', key=lambda x: m[x])

out

    a   b
2   i2  t3
3   i2  t1
4   i2  t3
0   i1  t1
1   i1  t2
Answered By: Panda Kim
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.