Pandas: Count Higher Ranks For Current Experiment Participants In Later Experiments (Part 1)
Question:
Learning Experiments
In a series of learning experiments, I would like to count the number of participants in each experiment that improved their performance in subsequent experiments (Rank 1 is highest). In addition, I would also like to count the number of participants in each experiment that subsequently reached the top rank.
Here is a short, sanitized version of the learning experiment csv file that I have loaded into a pandas dataframe (df_learning).
Experiment
Subject
Rank
A
Alpha
1
A
Bravo
2
A
Charlie
3
A
Delta
4
A
Echo
5
B
Alpha
1
B
Charlie
2
B
Echo
3
B
Foxtrot
4
B
Golf
5
B
India
6
B
Juliet
7
C
Juliet
1
C
Bravo
2
C
Charlie
3
Please advise?
Answers:
You can use a groupby.cummax
, then boolean indexing:
m = df['Rank'].sub(df.groupby('Subject')['Rank'].cummax()).lt(0)
improved_rank = df.loc[m, 'Subject'].unique()
output: ['Charlie', 'Echo', 'Juliet']
reached_top_rank = df.loc[m&df['Rank'].eq(1), 'Subject'].unique()
output: ['Juliet']
Learning Experiments
In a series of learning experiments, I would like to count the number of participants in each experiment that improved their performance in subsequent experiments (Rank 1 is highest). In addition, I would also like to count the number of participants in each experiment that subsequently reached the top rank.
Here is a short, sanitized version of the learning experiment csv file that I have loaded into a pandas dataframe (df_learning).
Experiment | Subject | Rank |
---|---|---|
A | Alpha | 1 |
A | Bravo | 2 |
A | Charlie | 3 |
A | Delta | 4 |
A | Echo | 5 |
B | Alpha | 1 |
B | Charlie | 2 |
B | Echo | 3 |
B | Foxtrot | 4 |
B | Golf | 5 |
B | India | 6 |
B | Juliet | 7 |
C | Juliet | 1 |
C | Bravo | 2 |
C | Charlie | 3 |
Please advise?
You can use a groupby.cummax
, then boolean indexing:
m = df['Rank'].sub(df.groupby('Subject')['Rank'].cummax()).lt(0)
improved_rank = df.loc[m, 'Subject'].unique()
output: ['Charlie', 'Echo', 'Juliet']
reached_top_rank = df.loc[m&df['Rank'].eq(1), 'Subject'].unique()
output: ['Juliet']