backwards fill column value until where another column value is 0

Question:

I have dataset with user id, col1, col2. col1 only consists of NaN or 0. I want to back fill col2 values until col1== 0 is reached for each user id, and also have a limit of 10. Backfill until col1==0 if the distance is less or equal to 10 rows. In that case, don’t do anything.

Input:

user_id   col1   col2          
  3        NaN    NaN
  3        0      NaN
  3        NaN    NaN 
  3        NaN     5
  5        0       NaN
  5        NaN     9
 ...  

Desired output:

user_id   col1   col2          
  3        NaN    NaN
  3        0       5
  3        NaN     5 
  3        NaN     5
  5        0       9
  5        NaN     9
 ...  

    
Asked By: prof32

||

Answers:

Create groups with user_id and col1:

df['col2'] = df.groupby(['user_id', df['col1'].eq(0).cumsum()])['col2'].bfill()
print(df)

# Output
   user_id  col1  col2
0        3   NaN   NaN
1        3   0.0   5.0
2        3   NaN   5.0
3        3   NaN   5.0
4        5   0.0   9.0
5        5   NaN   9.0

Detail about groups:

>>> pd.concat([df['user_id'], df['col1'].eq(0).cumsum()], axis=1)
   user_id  col1
0        3     0  # first group (nothing to backfill)
1        3     1  # second group (backfill 5)
2        3     1
3        3     1
4        5     2  # third group (backfill 9)
5        5     2

Update:

backfill until col1==0 if the distance is less or equal to 10 rows. In that case, don’t do anything.

bfill = lambda x: x.bfill() if len(x) < 10 else x
df['col2'] = df.groupby(['user_id', df['col1'].eq(0).cumsum()])['col2'].transform(bfill)

Important note: groupby_bfill has a limit parameter to limit how many values to fill. You can fill until 10 values then stop filling.

Answered By: Corralien
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.