How to delete duplicates pandas

Question:

I need to check if there are some duplicates value in one column of a dataframe using Pandas and, if there is any duplicate, delete the entire row.
I need to check just the first column.

Example:

object    type

apple     fruit
ball      toy
banana    fruit
xbox      videogame
banana    fruit
apple     fruit

What i need is:

object    type

apple     fruit
ball      toy
banana    fruit
xbox      videogame

I can delete the ‘object’ duplicates with the following code, but I can’t delete the entire row that contains the duplicate as the second column won’t be deleted.


df = pd.read_csv(directory, header=None,)

objects= df[0]

for object in df[0]:
   
Asked By: Fabix

||

Answers:

Select by duplicated mask and negate it

df = df[~df["object"].duplicated()]

Which gives

   object       type
0   apple      fruit
1    ball        toy
2  banana      fruit
3    xbox  videogame
Answered By: crayxt

use drop_duplicates method

d = pd.DataFrame(
    {'object': ['apple', 'ball', 'banana', 'xbox', 'banana', 'apple'],
    'type': ['fruit', 'toy', 'fruit', 'videogame', 'fruit', 'fruit']}
)
d.drop_duplicates()

there are several keyword args. that might come in handy (like inplace=True if you want your dataframe d to be updated)

Answered By: Jordi Pastor

You can use .drop_duplicates() with parameter subset='object' to select the column you want to check, as follows:

df_out = df.drop_duplicates(subset='object')

Result:

print(df_out)

   object       type
0   apple      fruit
1    ball        toy
2  banana      fruit
3    xbox  videogame
Answered By: SeaBean

To get the length after dropping duplicates

df = len(df)-len(df.drop_duplicates())
Answered By: Derrick Kuria
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.