Ranking order per group in Pandas

Question:

Consider a dataframe with three columns: group_ID, item_ID and value. Say we have 10 itemIDs total.

I need to rank each item_ID (1 to 10) within each group_ID based on value , and then see the mean rank (and other stats) across groups (e.g. the IDs with the highest value across groups would get ranks closer to 1). How can I do this in Pandas?

This answer does something very close with qcut, but not exactly the same.


A data example would look like:

      group_ID   item_ID  value
0   0S00A1HZEy        AB     10
1   0S00A1HZEy        AY      4
2   0S00A1HZEy        AC     35
3   0S03jpFRaC        AY     90
4   0S03jpFRaC        A5      3
5   0S03jpFRaC        A3     10
6   0S03jpFRaC        A2      8
7   0S03jpFRaC        A4      9
8   0S03jpFRaC        A6      2
9   0S03jpFRaC        AX      0

which would result in:

      group_ID   item_ID   rank
0   0S00A1HZEy        AB      2
1   0S00A1HZEy        AY      3
2   0S00A1HZEy        AC      1
3   0S03jpFRaC        AY      1
4   0S03jpFRaC        A5      5
5   0S03jpFRaC        A3      2
6   0S03jpFRaC        A2      4
7   0S03jpFRaC        A4      3
8   0S03jpFRaC        A6      6
9   0S03jpFRaC        AX      7

Answers:

There are lots of different arguments you can pass to rank; it looks like you can use rank("dense", ascending=False) to get the results you want, after doing a groupby:

>>> df["rank"] = df.groupby("group_ID")["value"].rank(method="dense", ascending=False)
>>> df
     group_ID item_ID  value  rank
0  0S00A1HZEy      AB     10     2
1  0S00A1HZEy      AY      4     3
2  0S00A1HZEy      AC     35     1
3  0S03jpFRaS      AY     90     1
4  0S03jpFRaS      A5      3     5
5  0S03jpFRaS      A3     10     2
6  0S03jpFRaS      A2      8     4
7  0S03jpFRaS      A4      9     3
8  0S03jpFRaS      A6      2     6
9  0S03jpFRaS      AX      0     7

But note that if you’re not using a global ranking scheme, finding out the mean rank across groups isn’t very meaningful– unless there are duplicate values in a group (and so you have duplicate rank values) all you’re doing is measuring how many elements there are in a group.

Answered By: DSM

If the dataframe is already sorted on value, then you can cumulatively count the position of the values in each group.

df['rank'] = df.sort_values(by=['group_ID', 'value']).groupby('group_ID').cumcount(ascending=False) + 1

res

If you want to ordinally rank values in each group, then you can transform pd.qcut. This is especially useful if the sizes of the groups are the same or the ranks are meaningful across groups or there are a lot duplicates in each group.

q = 10 # how many buckets to put the values in
df['rank'] = df.groupby('group_ID')['value'].transform(pd.qcut, q=q, labels=False, duplicates='drop')

# for descending order (smaller numbers have higher rank)
df['rank'] = q - df.groupby('group_ID')['value'].transform(pd.qcut, q=q, labels=False, duplicates='drop')

For the data in the OP, the result is as follows (note that the ordinal ranking is the same as groupby.rank):

res2

Answered By: cottontail
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.