Fill new rows based on missing column values

Question:

I have a data frame like that

week co_week Revenue Cohort_index
19/09/2021 01/10/2021 120 0
19/09/2021 03/10/2021 150 1
19/09/2021 06/10/2021 223 2
19/09/2021 07/10/2021 256 4
19/09/2021 08/10/2021 340 5
20/09/2021 06/10/2021 126 0
20/09/2021 07/10/2021 234 1

Now I’d like to check if one cohort_index is missing (3 in this case) , then insert a new row with the missing index , rest of column values are copied from the previous row while updating the data frame index.

Desired Output :

week co_week Revenue Cohort_index
19/09/2021 01/10/2021 120 0
19/09/2021 03/10/2021 150 1
19/09/2021 06/10/2021 223 2
19/09/2021 06/10/2021 223 3
19/09/2021 07/10/2021 256 4
19/09/2021 08/10/2021 340 5
20/09/2021 06/10/2021 126 0
20/09/2021 07/10/2021 234 1

I can’t hard-code the new raw since the data is huge!

new_raw = DataFrame({"week": 19/09/2022, "co_week": 06/10/2021, "Revenue": 223 ,"Cohort_index":3})
df = df.append(new_raw, ignore_index=False)
Asked By: Maikel Bastawrous

||

Answers:

Use DataFrame.reindex with method='ffill' parameter after convert Cohort_index to index by DataFrame.set_index, last convert index to columns and set original ordering by DataFrame.reindex:

df = (df.set_index('Cohort_index')
        .reindex(range(df['Cohort_index'].max() + 1), method='ffill')
        .reset_index()
        .reindex(df.columns, axis=1))
print (df)
         week     co_week  Revenue  Cohort_index
0  19/09/2021  01/10/2021      120             0
1  19/09/2021  03/10/2021      150             1
2  19/09/2021  06/10/2021      223             2
3  19/09/2021  06/10/2021      223             3
4  19/09/2021  07/10/2021      256             4
5  19/09/2021  08/10/2021      340             5

EDIT: Solution working per groups created if difference is less like 0:

g = df['Cohort_index'].diff().lt(0).cumsum()

df = (df.set_index(['Cohort_index'])
        .groupby(g.tolist())
        .apply(lambda x: x.reindex(range(x.index.max() + 1), method='ffill'))
        .droplevel(0)
        .reset_index()
        .reindex(df.columns, axis=1)
        )
print (df)
         week     co_week  Revenue  Cohort_index
0  19/09/2021  01/10/2021      120             0
1  19/09/2021  03/10/2021      150             1
2  19/09/2021  06/10/2021      223             2
3  19/09/2021  06/10/2021      223             3
4  19/09/2021  07/10/2021      256             4
5  19/09/2021  08/10/2021      340             5
6  20/09/2021  06/10/2021      126             0
7  20/09/2021  07/10/2021      234             1
Answered By: jezrael

You can set_index temporarily, reindex with ffill, then reset_index and restore columns in order:

out = (df
  .set_index('Cohort_index')
  .reindex(range(df['Cohort_index'].max()+1), method='ffill')
  .reset_index()[df.columns]
)

output:

         week     co_week  Revenue  Cohort_index
0  19/09/2021  01/10/2021    120.0             0
1  19/09/2021  03/10/2021    150.0             1
2  19/09/2021  06/10/2021    223.0             2
3  19/09/2021  06/10/2021    223.0             3
4  19/09/2021  07/10/2021    256.0             4
5  19/09/2021  08/10/2021    340.0             5

per group

You can do the same in a groupby

out = (df
   .groupby('week', as_index=False)
   .apply(lambda g: g.set_index('Cohort_index')
                     .reindex(range(g['Cohort_index'].max()+1), method='ffill')
                     .reset_index()
          )
   .droplevel(0)[df.columns]
)

output:

         week     co_week  Revenue  Cohort_index
0  19/09/2021  01/10/2021      120             0
1  19/09/2021  03/10/2021      150             1
2  19/09/2021  06/10/2021      223             2
3  19/09/2021  06/10/2021      223             3
4  19/09/2021  07/10/2021      256             4
5  19/09/2021  08/10/2021      340             5
0  20/09/2021  06/10/2021      126             0
1  20/09/2021  07/10/2021      234             1
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.