Replacing the first occurrence of a text in each row of a pandas DataFrame based on an ID
Question:
I am trying to replace the first occurance based on the ID
. My dataset looks like this:
df=
Index ID Status
0 1895001 review
1 1895001 review
2 1895001 review
3 2104264 review
4 2102404 review
5 2102404 review
6 1809905 review
7 1809905 review
8 1809905 review
9 1811700 review
I tried this df.values[df.index, np.argmax(df.values=="review",1)] = "first review"
, but it replaces all of them 🙁
This is what I am expecting:
df=
Index ID Status
0 1895001 first review
1 1895001 review
2 1895001 review
3 2104264 first review
4 2102404 first review
5 2102404 review
6 1809905 first review
7 1809905 review
8 1809905 review
9 1811700 first review
Answers:
Use boolean indexing with the boolean inverse (~
) of duplicated
:
df.loc[~df['ID'].duplicated(), 'Status'] = 'first review'
Output:
Index ID Status
0 0 1895001 first review
1 1 1895001 review
2 2 1895001 review
3 3 2104264 first review
4 4 2102404 first review
5 5 2102404 review
6 6 1809905 first review
7 7 1809905 review
8 8 1809905 review
9 9 1811700 first review
You can use the groupby method of pandas to identify the first occurance based on the ID and then update the ‘Status’ column using boolean indexing. Here’s the code:
df.loc[df.groupby('ID').apply(lambda x: x['Status'].index[0]), 'Status'] = 'first review'
This code creates a group of rows based on the ‘ID’ column and then finds the index of the first occurance of ‘review’ in each group. Then, it updates the ‘Status’ column of these rows with ‘first review’.
The resulting dataframe should look like this:
Index ID Status
0 0 1895001 first review
1 1 1895001 review
2 2 1895001 review
3 3 2104264 first review
4 4 2102404 first review
5 5 2102404 review
6 6 1809905 first review
7 7 1809905 review
8 8 1809905 review
9 9 1811700 first review
I am trying to replace the first occurance based on the ID
. My dataset looks like this:
df=
Index ID Status
0 1895001 review
1 1895001 review
2 1895001 review
3 2104264 review
4 2102404 review
5 2102404 review
6 1809905 review
7 1809905 review
8 1809905 review
9 1811700 review
I tried this df.values[df.index, np.argmax(df.values=="review",1)] = "first review"
, but it replaces all of them 🙁
This is what I am expecting:
df=
Index ID Status
0 1895001 first review
1 1895001 review
2 1895001 review
3 2104264 first review
4 2102404 first review
5 2102404 review
6 1809905 first review
7 1809905 review
8 1809905 review
9 1811700 first review
Use boolean indexing with the boolean inverse (~
) of duplicated
:
df.loc[~df['ID'].duplicated(), 'Status'] = 'first review'
Output:
Index ID Status
0 0 1895001 first review
1 1 1895001 review
2 2 1895001 review
3 3 2104264 first review
4 4 2102404 first review
5 5 2102404 review
6 6 1809905 first review
7 7 1809905 review
8 8 1809905 review
9 9 1811700 first review
You can use the groupby method of pandas to identify the first occurance based on the ID and then update the ‘Status’ column using boolean indexing. Here’s the code:
df.loc[df.groupby('ID').apply(lambda x: x['Status'].index[0]), 'Status'] = 'first review'
This code creates a group of rows based on the ‘ID’ column and then finds the index of the first occurance of ‘review’ in each group. Then, it updates the ‘Status’ column of these rows with ‘first review’.
The resulting dataframe should look like this:
Index ID Status
0 0 1895001 first review
1 1 1895001 review
2 2 1895001 review
3 3 2104264 first review
4 4 2102404 first review
5 5 2102404 review
6 6 1809905 first review
7 7 1809905 review
8 8 1809905 review
9 9 1811700 first review