Create chronology column in pandas DataFrame

Question:

I have a dataframe characterized by two essential columns: name and timestamp.


df = pd.DataFrame({'name':['tom','tom','tom','bert','bert','sam'], 
                   'timestamp':[15,13,14,23,22,14]})

I would like to create a third column chronology that checks the timestamp for each name and gives me the chronological order per name such that the final product looks like this:

df_final = pd.DataFrame({'name':['tom','tom','tom','bert','bert','sam'], 
                         'timestamp':[15,13,14,23,22,14], 
                         'chronology':[3,2,1,2,1,1]})

I understand that I can go df = df.sort_values(['name', 'timestamp']) but how do I create the chronology column?

Asked By: econben

||

Answers:

You can do with groupby().cumcount() if the timestamps are not likely repeated:

df['chronology']=  df.sort_values('timestamp').groupby('name').cumcount().add(1)

or groupby().rank():

df['chronology'] = df.groupby('name')['timestamp'].rank().astype(int)

Output:

   name  timestamp  chronology
0   tom         15           3
1   tom         13           1
2   tom         14           2
3  bert         23           2
4  bert         22           1
5   sam         14           1
Answered By: Quang Hoang

The function GroupBy.rank(), does exactly what you need. From the documentation:

GroupBy.rank(method=’average’, ascending=True, na_option=’keep’, pct=False, axis=0)

Provide the rank of values within each group.

Try this code:

df['chronology'] = df.groupby(by=['name']).timestamp.rank().astype(int)

Result:

   name  timestamp  chronology
   tom         15           3
   tom         13           1
   tom         14           2
  bert         23           2
  bert         22           1
   sam         14           1
Answered By: Massifox
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.