Create column that orders ID by first Start Date
Question:
Imagine I have the following dataframe:
ID Start Date
1 1990-01-01
1 1990-01-01
1 1991-01-01
2 1991-01-01
2 1990-01-01
3 2002-01-01
3 2000-01-01
4 1991-01-01
What would be the best way to create a column named Order that, for each unique ID in the ID column, starting with 1 with the earliest Start Date and adds 1 to the subsequential earliest Start Dates (and if same value, doens’t matter the order) resulting on the following dataframe:
ID Start Date Order
1 1990-01-01 2
1 1990-01-01 3
1 1989-01-01 1
2 1991-01-01 2
2 1990-01-01 1
3 2002-01-01 2
3 2000-01-01 1
4 1991-01-01 1
Answers:
Use groupby.rank
:
df['Start Date'] = pd.to_datetime(df['Start Date'])
df['Order'] = df.groupby('ID')['Start Date'].rank('first', ascending=False).astype(int)
Output:
ID Start Date Order
0 1 1990-01-01 2
1 1 1990-01-01 3
2 1 1991-01-01 1
3 2 1991-01-01 1
4 2 1990-01-01 2
5 3 2002-01-01 1
6 3 2000-01-01 2
7 4 1991-01-01 1
With ascending=True
:
ID Start Date Order
0 1 1990-01-01 1
1 1 1990-01-01 2
2 1 1991-01-01 3
3 2 1991-01-01 2
4 2 1990-01-01 1
5 3 2002-01-01 2
6 3 2000-01-01 1
7 4 1991-01-01 1
Imagine I have the following dataframe:
ID Start Date
1 1990-01-01
1 1990-01-01
1 1991-01-01
2 1991-01-01
2 1990-01-01
3 2002-01-01
3 2000-01-01
4 1991-01-01
What would be the best way to create a column named Order that, for each unique ID in the ID column, starting with 1 with the earliest Start Date and adds 1 to the subsequential earliest Start Dates (and if same value, doens’t matter the order) resulting on the following dataframe:
ID Start Date Order
1 1990-01-01 2
1 1990-01-01 3
1 1989-01-01 1
2 1991-01-01 2
2 1990-01-01 1
3 2002-01-01 2
3 2000-01-01 1
4 1991-01-01 1
Use groupby.rank
:
df['Start Date'] = pd.to_datetime(df['Start Date'])
df['Order'] = df.groupby('ID')['Start Date'].rank('first', ascending=False).astype(int)
Output:
ID Start Date Order
0 1 1990-01-01 2
1 1 1990-01-01 3
2 1 1991-01-01 1
3 2 1991-01-01 1
4 2 1990-01-01 2
5 3 2002-01-01 1
6 3 2000-01-01 2
7 4 1991-01-01 1
With ascending=True
:
ID Start Date Order
0 1 1990-01-01 1
1 1 1990-01-01 2
2 1 1991-01-01 3
3 2 1991-01-01 2
4 2 1990-01-01 1
5 3 2002-01-01 2
6 3 2000-01-01 1
7 4 1991-01-01 1