I am pulling a google sheet into a dataframe and I’m trying to first find if any of the values in a specific column are duplicates and then ask the user to fix the issue on the google sheet and rerun that part of the code again. Where I’m stuck is – how to trigger to rerun the code if any values are true. This is what I have so far – my approach was to check with duplicated() and add a column to the dataframe. The reason I wanted to do that is so I can filter and then show the user which rows have issues specifically.
id | record_id | 0 | abc1 | 1 | abc2 | 2 | abc3 | 3 | abc1 |
This is the code I tried so far:
df ['record_id_duplicate'] = df.duplicated(subset='record_id',keep=False) record_id_validation = None if 'True' in df ['record_id_duplicate']: record_id_validation = True else: False
I do get the column added correctly – but not really sure where to go from here.
This is how df looks after I added duplicated column. Any help would be appreciated
id | record_id | record_id_duplicate 0 | abc1 |True 1 | abc2 |False 2 | abc3 |False 3 | abc1 |True
You can call
any on boolean type column which will return
True if any of the values in the column is
True, else it returns
False if none of the value is
>>> df['record_id_duplicate'].any() True