Pandas groupby: Replace values in one column before the first occurrence in another column

Question:

Below is a toy Pandas dataframe that has three columns: ‘id’ (group id), ‘b’ (for condition), and ‘c’ (target):

df = pd.DataFrame({'id' : [1,1,1,1,1,1,1,2,2,2,2,2,2,2], 
              'b'  : [3,4,5,'A',3,4,'A',1,'A',1,3,'A',2,3], 
              'c'  : [1,0,1,10,1,1,20,1,10,0,1,20,1,1]})
print(df)
    id  b   c
0    1  3   1
1    1  4   0
2    1  5   1
3    1  A  10
4    1  3   1
5    1  4   1
6    1  A  20
7    2  1   1
8    2  A  10
9    2  1   0
10   2  3   1
11   2  A  20
12   2  2   1
13   2  3   1

For each group, I want to replace the values in column ‘c’ with nan (i.e., np.nan) before the first occurrence of ‘A’ in column ‘b’.

The desired output is the following:

desired_output_df = pd.DataFrame({'id' : [1,1,1,1,1,1,1,2,2,2,2,2,2,2], 
                                  'b'  : [3,4,5,'A',3,4,'A',1,'A',1,3,'A',2,3], 
                                  'c'  : [np.nan,np.nan,np.nan,10,1,1,20,np.nan,10,0,1,20,1,1]})
print(desired_output_df)
    id  b     c
0    1  3   NaN
1    1  4   NaN
2    1  5   NaN
3    1  A  10.0
4    1  3   1.0
5    1  4   1.0
6    1  A  20.0
7    2  1   NaN
8    2  A  10.0
9    2  1   0.0
10   2  3   1.0
11   2  A  20.0
12   2  2   1.0
13   2  3   1.0

I am able to get the index of the values of column c that I want to change using the following command: df.groupby('id').apply(lambda x: x.loc[:(x.b == 'A').idxmax()-1]).index. But the result is a "MultiIndex" and I can’t seem to use it to replace the values.

MultiIndex([(1, 0),
            (1, 1),
            (1, 2),
            (2, 7)],
           names=['id', None])

Thanks in advance.

Asked By: In Joon Noh

||

Answers:

Try:

df['c'] = np.where(df.groupby('id').apply(lambda x: x['b'].eq('A').cumsum()) > 0, df['c'], np.nan)
print(df)

Prints:

    id  b     c
0    1  3   NaN
1    1  4   NaN
2    1  5   NaN
3    1  A  10.0
4    1  3   1.0
5    1  4   1.0
6    1  A  20.0
7    2  1   NaN
8    2  A  10.0
9    2  1   0.0
10   2  3   1.0
11   2  A  20.0
12   2  2   1.0
13   2  3   1.0
Answered By: Andrej Kesely
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.