Create chronology column in pandas DataFrame
Question:
I have a dataframe characterized by two essential columns: name and timestamp.
df = pd.DataFrame({'name':['tom','tom','tom','bert','bert','sam'],
'timestamp':[15,13,14,23,22,14]})
I would like to create a third column chronology that checks the timestamp for each name and gives me the chronological order per name such that the final product looks like this:
df_final = pd.DataFrame({'name':['tom','tom','tom','bert','bert','sam'],
'timestamp':[15,13,14,23,22,14],
'chronology':[3,2,1,2,1,1]})
I understand that I can go df = df.sort_values(['name', 'timestamp'])
but how do I create the chronology column?
Answers:
You can do with groupby().cumcount()
if the timestamps are not likely repeated:
df['chronology']= df.sort_values('timestamp').groupby('name').cumcount().add(1)
or groupby().rank()
:
df['chronology'] = df.groupby('name')['timestamp'].rank().astype(int)
Output:
name timestamp chronology
0 tom 15 3
1 tom 13 1
2 tom 14 2
3 bert 23 2
4 bert 22 1
5 sam 14 1
The function GroupBy.rank(), does exactly what you need. From the documentation:
GroupBy.rank(method=’average’, ascending=True, na_option=’keep’, pct=False, axis=0)
Provide the rank of values within each group.
Try this code:
df['chronology'] = df.groupby(by=['name']).timestamp.rank().astype(int)
Result:
name timestamp chronology
tom 15 3
tom 13 1
tom 14 2
bert 23 2
bert 22 1
sam 14 1
I have a dataframe characterized by two essential columns: name and timestamp.
df = pd.DataFrame({'name':['tom','tom','tom','bert','bert','sam'],
'timestamp':[15,13,14,23,22,14]})
I would like to create a third column chronology that checks the timestamp for each name and gives me the chronological order per name such that the final product looks like this:
df_final = pd.DataFrame({'name':['tom','tom','tom','bert','bert','sam'],
'timestamp':[15,13,14,23,22,14],
'chronology':[3,2,1,2,1,1]})
I understand that I can go df = df.sort_values(['name', 'timestamp'])
but how do I create the chronology column?
You can do with groupby().cumcount()
if the timestamps are not likely repeated:
df['chronology']= df.sort_values('timestamp').groupby('name').cumcount().add(1)
or groupby().rank()
:
df['chronology'] = df.groupby('name')['timestamp'].rank().astype(int)
Output:
name timestamp chronology
0 tom 15 3
1 tom 13 1
2 tom 14 2
3 bert 23 2
4 bert 22 1
5 sam 14 1
The function GroupBy.rank(), does exactly what you need. From the documentation:
GroupBy.rank(method=’average’, ascending=True, na_option=’keep’, pct=False, axis=0)
Provide the rank of values within each group.
Try this code:
df['chronology'] = df.groupby(by=['name']).timestamp.rank().astype(int)
Result:
name timestamp chronology
tom 15 3
tom 13 1
tom 14 2
bert 23 2
bert 22 1
sam 14 1