Pandas: Count Higher Ranks For Current Experiment Participants In Later Experiments (Part 1)

Question:

Learning Experiments

In a series of learning experiments, I would like to count the number of participants in each experiment that improved their performance in subsequent experiments (Rank 1 is highest). In addition, I would also like to count the number of participants in each experiment that subsequently reached the top rank.

Here is a short, sanitized version of the learning experiment csv file that I have loaded into a pandas dataframe (df_learning).

Experiment Subject Rank
A Alpha 1
A Bravo 2
A Charlie 3
A Delta 4
A Echo 5
B Alpha 1
B Charlie 2
B Echo 3
B Foxtrot 4
B Golf 5
B India 6
B Juliet 7
C Juliet 1
C Bravo 2
C Charlie 3

Please advise?

Asked By: matekus

||

Answers:

You can use a groupby.cummax, then boolean indexing:

m = df['Rank'].sub(df.groupby('Subject')['Rank'].cummax()).lt(0)

improved_rank = df.loc[m, 'Subject'].unique()

output: ['Charlie', 'Echo', 'Juliet']

reached_top_rank = df.loc[m&df['Rank'].eq(1), 'Subject'].unique()

output: ['Juliet']

Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.