Adding Counter on Duplicate Keys for Merging

Question

I created a key for merging. Unfortunately, there are some duplicate keys. But I need to keep these rows. I am thinking that for each set of duplicate keys, I can just add counting number 1, 2, 3 and so on to each of the duplicate keys to make them unique.

Can you recommend a command or method do this?

These are actually the codes before the part that I am really stuck on how to proceed:

#creating a key variable for merging
df['dfkey'] = df['ColA'].map(str) + ' ' + df['ColB'].map(str) + ' ' + df['ColC'].map(str)    #creating the key
df['dfkeycount'] = df.groupby('dfkey')['dfkey'].transform('count')                           #counting the freq of each dfkey ---> to know if they are unique
df['dfkeycountcat'] = df.groupby(['dfkey','Category'])['dfkey'].transform('count')           #to count the freq of each dfkey per Category Note: Later, will divide the dataset into Category. Then will merge them side by side (one variable will be renamed based on the category name).

dataunique = df.loc[df['dfkeycountcat'] == 1]                                                #created this subset for those with clean keys. I am actually successful with the merging if only within this dataset.
dataduplicate = df.loc[df['dfkeycountcat'] > 1]                                              #this is the dataset that I want to apply the code for adding a sequence number at the end of the key.

Asked By: ambmil

||

Source

Answer 1

Thank you very much to the one who responded. Was able to use cumcount…

df['dfkeynew'] = df['dfkey'].map(str) + df.groupby('dfkey').cumcount().map(str)
df['dfkeycountnew'] = df.groupby('dfkeynew')['dfkeynew'].transform('count')   

df['dfkeycountnew'].value_counts()

They are all unique now.

Answered By: ambmil

Adding Counter on Duplicate Keys for Merging

Question:

Answers: