Fill missing cohort indices

Question:

I have a data frame like that

week Revenue Cohort_index
19/09/2021 120 0
19/09/2021 150 1
19/09/2021 223 2
19/09/2021 256 4
20/09/2021 340 0
20/09/2021 126 1
20/09/2021 234 2

Now I’d like to check if cohort_index is missing (3 in this case for date 19/09/2021 & also 3 for 20/09/2021) , then insert a new row with the missing index.

Maximum number of indices will be decreasing so let’s say for 19/09/2021 the
maximum number of cohort index is 4 , so the next date 20/09/2021 will
have 3 indices.. and I need to fill all missing indices from the minimum to maximum

rest of column values are copied from the previous row except Revenue that will be filled with 0 while updating the data frame index.

Data is more granular than what I have posted , so for every date and every cohort_index I
have different countries and different device types.

Desired Output :

week Revenue Cohort_index
19/09/2021 120 0
19/09/2021 150 1
19/09/2021 223 2
19/09/2021 0 3
19/09/2021 256 4
20/09/2021 340 0
20/09/2021 126 1
20/09/2021 234 2
20/09/2021 0 3

I think a For loop is needed , but I can’t get my head around it.

Asked By: Maikel Bastawrous

||

Answers:

A bit more tricky than I’d like but it should work. Could be simplified a little if there was a way to get current group number in groupby apply, is there?

columns = df.columns
max_cohort = 4
df["ngroup"] = df.groupby('week').ngroup()

out = (
 df.groupby('week', as_index=False)
   .apply(lambda g: g.set_index('Cohort_index')
                     .reindex(range(max_cohort + 1 - max(g["ngroup"])))
   )).droplevel(0).reset_index()[columns]

out["week"] = out["week"].fillna(method="ffill")
out["Revenue"] = out["Revenue"].fillna(0)

results is

          week  Revenue  Cohort_index
0  19/09/2021     120.0             0
1  19/09/2021     150.0             1
2  19/09/2021     223.0             2
3  19/09/2021       0.0             3
4  19/09/2021     256.0             4
5  20/09/2021     340.0             0
6  20/09/2021     126.0             1
7  20/09/2021     234.0             2
8  20/09/2021       0.0             3
Answered By: filippo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.