Efficient way for ordering values within group in Pandas

Question:

I’m working with time series data, which is packaged in a time long dataframe, something like this:

ACCOUNT | VAR1 | VAR2 | DAY

I’m interested in creating a new column DAY_ORD, which would give the ordinal rank of the DAY variable to each row within a group over unique (ACCOUNT, VAR1, VAR2) triplets.

Here is a small example of what I want to achieve:

ACCOUNT VAR1 VAR2 DAY DAY-ORD
A X True 2022-02-03 0
A X True 2022-02-04 1
B X True 2021-05-18 0
A X True 2022-02-05 2
B X True 2022-05-20 1
A Y True 2022-02-05 0
A X True 2022-03-12 3

Here is my current implementation:

#initialize an empty 'DAY_ORD' column
df['DAY_ORD'] = [None for i in range(len(df))]

#iterate over all triplets that appear in the data
for (_, row) in fb_data[['ACCOUNT', 'VAR1', 'VAR2']].copy().drop_duplicates().iterrows()):
    acc, v1, v2 = row[0], row[1], row[2]

    #find the df slice that adheres to the considered triplet
    fdf = df.loc[(df.ACCOUNT== acc) & (fb_data.VAR1 == v1) & (fb_data.VAR2 == v2)].sort_values('DAY')

    #assign them an ordinal rank
    fdf['DAY_ORD'] = [i for i in range(len(fdf))]

    #set the DAY_ORD values in the original dataframe
    for i in fdf.index:
        df.loc[i, 'DAY_ORD'] = fdf['DAY_ORD'][i]
df['DAY_ORD']

It seems like it will do the job, but it runs very slowly, at around 8 it/s. What is a clean way to make this faster?

Asked By: Vid Stropnik

||

Answers:

Use GroupBy.rank with convert values to datetimes, subtract 1 and convert to integers:

df['DAY'] = pd.to_datetime(df['DAY'])

df['DAY-ORD'] = (df.groupby(['ACCOUNT', 'VAR1', 'VAR2'])['DAY']
                   .rank('dense').sub(1).astype(int))

print (df)
  ACCOUNT VAR1  VAR2        DAY  DAY-ORD
0       A    X  True 2022-02-03        0
1       A    X  True 2022-02-04        1
2       B    X  True 2021-05-18        0
3       A    X  True 2022-02-05        2
4       B    X  True 2022-05-20        1
5       A    Y  True 2022-02-05        0
6       A    X  True 2022-03-12        3
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.