backwards fill column value until where another column value is 0
Question:
I have dataset with user id, col1, col2. col1 only consists of NaN or 0. I want to back fill col2 values until col1== 0 is reached for each user id, and also have a limit of 10. Backfill until col1==0 if the distance is less or equal to 10 rows. In that case, don’t do anything.
Input:
user_id col1 col2
3 NaN NaN
3 0 NaN
3 NaN NaN
3 NaN 5
5 0 NaN
5 NaN 9
...
Desired output:
user_id col1 col2
3 NaN NaN
3 0 5
3 NaN 5
3 NaN 5
5 0 9
5 NaN 9
...
Answers:
Create groups with user_id
and col1
:
df['col2'] = df.groupby(['user_id', df['col1'].eq(0).cumsum()])['col2'].bfill()
print(df)
# Output
user_id col1 col2
0 3 NaN NaN
1 3 0.0 5.0
2 3 NaN 5.0
3 3 NaN 5.0
4 5 0.0 9.0
5 5 NaN 9.0
Detail about groups:
>>> pd.concat([df['user_id'], df['col1'].eq(0).cumsum()], axis=1)
user_id col1
0 3 0 # first group (nothing to backfill)
1 3 1 # second group (backfill 5)
2 3 1
3 3 1
4 5 2 # third group (backfill 9)
5 5 2
Update:
backfill until col1==0
if the distance is less or equal to 10 rows. In that case, don’t do anything.
bfill = lambda x: x.bfill() if len(x) < 10 else x
df['col2'] = df.groupby(['user_id', df['col1'].eq(0).cumsum()])['col2'].transform(bfill)
Important note: groupby_bfill
has a limit
parameter to limit how many values to fill. You can fill until 10 values then stop filling.
I have dataset with user id, col1, col2. col1 only consists of NaN or 0. I want to back fill col2 values until col1== 0 is reached for each user id, and also have a limit of 10. Backfill until col1==0 if the distance is less or equal to 10 rows. In that case, don’t do anything.
Input:
user_id col1 col2
3 NaN NaN
3 0 NaN
3 NaN NaN
3 NaN 5
5 0 NaN
5 NaN 9
...
Desired output:
user_id col1 col2
3 NaN NaN
3 0 5
3 NaN 5
3 NaN 5
5 0 9
5 NaN 9
...
Create groups with user_id
and col1
:
df['col2'] = df.groupby(['user_id', df['col1'].eq(0).cumsum()])['col2'].bfill()
print(df)
# Output
user_id col1 col2
0 3 NaN NaN
1 3 0.0 5.0
2 3 NaN 5.0
3 3 NaN 5.0
4 5 0.0 9.0
5 5 NaN 9.0
Detail about groups:
>>> pd.concat([df['user_id'], df['col1'].eq(0).cumsum()], axis=1)
user_id col1
0 3 0 # first group (nothing to backfill)
1 3 1 # second group (backfill 5)
2 3 1
3 3 1
4 5 2 # third group (backfill 9)
5 5 2
Update:
backfill until
col1==0
if the distance is less or equal to 10 rows. In that case, don’t do anything.
bfill = lambda x: x.bfill() if len(x) < 10 else x
df['col2'] = df.groupby(['user_id', df['col1'].eq(0).cumsum()])['col2'].transform(bfill)
Important note: groupby_bfill
has a limit
parameter to limit how many values to fill. You can fill until 10 values then stop filling.