Pandas to add a column of numbers to denote months recency

Question:

A simple dataframe that I want to add a column of numbers to indicate how recent the month is, e.g. the most recent month has the highest "score", the furthest has the lowest.

Clumsy lines below helps the simple dataframe, but incapable with large ones:

import pandas as pd
from io import StringIO

csvfile = StringIO("""
Town,Department,Staff,Month,Project,Score
East,Produce,Ethan,1987-08,A814,27
East,Produce,Ethan,1987-09,A848,27
East,Produce,Ethan,1987-10,A736,29
East,Meat,Harry,1987-07,A813,26""")

df = pd.read_csv(csvfile, sep = ',', engine='python')

def condition(s):
    if (s['Month'] == '1987-10'):
        return 4
    if (s['Month'] == '1987-09'):
        return 3
    if (s['Month'] == '1987-08'):
        return 2
    if (s['Month'] == '1987-07'):
        return 1
    else:
        return ''

df["Month score"] = df.apply(condition, axis=1)

print (df)

enter image description here

For another large dataframe with 24 months and more, months in the rows are duplicated, what’s the good way to write it?

Asked By: Mark K

||

Answers:

If possible use Series.rank:

df['score'] = df['Month'].rank(method='dense').astype(int)
print (df)
   Town Department  Staff    Month Project  Score  score
0  East    Produce  Ethan  1987-08    A814     27      2
1  East    Produce  Ethan  1987-09    A848     27      3
2  East    Produce  Ethan  1987-10    A736     29      4
3  East       Meat  Harry  1987-07    A813     26      1
Answered By: jezrael

This seems to work, no need for a month score

df['Month'] = pd.to_datetime(df['Month'])
df.sort_values('Month', ascending=False )

Or if you really need a score

Score = pd.to_datetime(df['Month'])
df['Score'] = Score
df.sort_values('Score', ascending=False)
Answered By: newoptionz
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.