How to find if any values in pandas data frame are duplicated and rerun a piece of code after user confirmation


I am pulling a google sheet into a dataframe and I’m trying to first find if any of the values in a specific column are duplicates and then ask the user to fix the issue on the google sheet and rerun that part of the code again. Where I’m stuck is – how to trigger to rerun the code if any values are true. This is what I have so far – my approach was to check with duplicated() and add a column to the dataframe. The reason I wanted to do that is so I can filter and then show the user which rows have issues specifically.

id | record_id | 
0  | abc1      |
1  | abc2      |
2  | abc3      |
3  | abc1      |

This is the code I tried so far:

df ['record_id_duplicate']  = df.duplicated(subset='record_id',keep=False)

record_id_validation = None
if 'True' in df ['record_id_duplicate']:
    record_id_validation = True

I do get the column added correctly – but not really sure where to go from here.
This is how df looks after I added duplicated column. Any help would be appreciated

id | record_id | record_id_duplicate
0  | abc1      |True
1  | abc2      |False
2  | abc3      |False
3  | abc1      |True
Asked By: Mike P



You can call any on boolean type column which will return True if any of the values in the column is True, else it returns False if none of the value is True:

>>> df['record_id_duplicate'].any()
Answered By: ThePyGuy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.