Pandas top n values in each group

Question:

I have a dataframe like

item      date       hour     value
  a         4         12       123
  a         6         11        54
  b         1          7       146
  c         8          1        97
  a         9          5        10
  c         4          5       114
  b         1          7       200
...       ...        ...       ...

and I want to keep the top 10 item by value (discard the rest is ok), regardless any other column. They are not sorted.

Following my input example, and as I didn’t write enough to get 10 from every item, the expected output would be something like this if I want the top 1:

item      date       hour     value
  a         4         12       123
  c         4          5       114
  b         1          7       200
...       ...        ...       ...

I’ve seen this answer but I’m not sure how to tell pandas to take value for the calculation.

Asked By: Javier

||

Answers:

You can sort_values by both ['item', 'value'] and then groupby.head:

df.sort_values(['item', 'value'], ascending=False).groupby('item').head(10)

Or with nlargest:

df.groupby('item').value.nlargest(10).reset_index()
Answered By: yatu
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.