Replacing the first occurrence of a text in each row of a pandas DataFrame based on an ID

Question:

I am trying to replace the first occurance based on the ID. My dataset looks like this:

df=

Index     ID             Status  
0     1895001            review   
1     1895001            review      
2     1895001            review      
3     2104264            review        
4     2102404            review        
5     2102404            review         
6     1809905            review
7     1809905            review       
8     1809905            review      
9     1811700            review

I tried this df.values[df.index, np.argmax(df.values=="review",1)] = "first review", but it replaces all of them 🙁

This is what I am expecting:

df=

Index   ID                   Status
0    1895001             first review         
1    1895001             review      
2    1895001             review       
3    2104264             first review        
4    2102404             first review        
5    2102404             review         
6    1809905             first review
7    1809905             review       
8    1809905             review       
9    1811700             first review
Asked By: dagi_paulos

||

Answers:

Use boolean indexing with the boolean inverse (~) of duplicated:

df.loc[~df['ID'].duplicated(), 'Status'] = 'first review'

Output:

   Index       ID        Status
0      0  1895001  first review
1      1  1895001        review
2      2  1895001        review
3      3  2104264  first review
4      4  2102404  first review
5      5  2102404        review
6      6  1809905  first review
7      7  1809905        review
8      8  1809905        review
9      9  1811700  first review
Answered By: mozway

You can use the groupby method of pandas to identify the first occurance based on the ID and then update the ‘Status’ column using boolean indexing. Here’s the code:

df.loc[df.groupby('ID').apply(lambda x: x['Status'].index[0]), 'Status'] = 'first review'

This code creates a group of rows based on the ‘ID’ column and then finds the index of the first occurance of ‘review’ in each group. Then, it updates the ‘Status’ column of these rows with ‘first review’.

The resulting dataframe should look like this:

       Index       ID         Status
0      0  1895001  first review
1      1  1895001       review
2      2  1895001       review
3      3  2104264  first review
4      4  2102404  first review
5      5  2102404       review
6      6  1809905  first review
7      7  1809905       review
8      8  1809905       review
9      9  1811700  first review
Answered By: OLEG KUSTAROV
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.