How to order and index the column in a dataframe?

Question:

I am thinking a way to order the data frame and create a column to sort the order.
For example:

df = pd.DataFrame({'YYYYMM':[202206,202207,202206,202209,202206,202207]})
   YYYYMM
0  202206
1  202207
2  202206
3  202209
4  202206
5  202207

Then I tried to order it by using numpy

df['order'] = np.argsort(df['YYYYMM'])
   YYYYMM  order
0  202206      0
1  202207      2
2  202206      4
3  202209      1
4  202206      5
5  202207      3

However, I want the same value can share the same order like

   YYYYMM ORDER
0  202206 0
1  202207 1
2  202206 0
3  202209 2
4  202206 0
5  202207 1

What should I do to achieve it? Thank you.

Asked By: aukk123

||

Answers:

Use Series.rank with method='dense', convert to integers and subtract 1:

df['order'] = df['YYYYMM'].rank(method='dense').astype(int).sub(1)
print (df)
   YYYYMM  order
0  202206      0
1  202207      1
2  202206      0
3  202209      2
4  202206      0
5  202207      1
Answered By: jezrael

Use rank with the method='dense' parameter and subtract 1 as the first ranks is 1 and convert to integer:

df['order'] = df['YYYYMM'].rank(method='dense').sub(1).astype(int)

output:

   YYYYMM  order
0  202206      0
1  202207      1
2  202206      0
3  202209      2
4  202206      0
5  202207      1
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.