Fill panel data with ranked timepoints in pandas

Question:

Given a DataFrame that represents instances of called customers:

import pandas as pd
import numpy as np
df_1 = pd.DataFrame({"customer_id" : [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 5]})

The data is ordered by time such that every customer is a time-series and every customer has different timestamps. Thus I need a column that consists of the ranked timepoints:

df_2 = pd.DataFrame({"customer_id" : [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 5],
"call_nr" : [0,1,2,0,1,0,1,2,3,0,0,1]})

After trying different approaches I came up with this to create call_nr:

np.concatenate([np.arange(df["customer_id"].value_counts().loc[i]) for i in df["customer_id"].unique()])

It works, but I doubt this is best practice. Is there a better solution?

Asked By: panel_fill

||

Answers:

A simpler solution would be to groupby your ‘customer_id’ and use cumcount:

>>> df_1.groupby('customer_id').cumcount()

0     0
1     1
2     2
3     0
4     1
5     0
6     1
7     2
8     3
9     0
10    0
11    1

which you can assign back as a column in your dataframe

Answered By: sophocles