Remove duplicate words in the same cell within a column in python

Question

i need somebody’s help, i have a column with words, i want to remove the duplicated words inside each cell

what i want to get is something like this

words	expected
car apple car good	car apple good
good bad well good	good bad well
car apple bus food	car apple bus food

i’ve tried this but is not working

from collections import OrderedDict


df['expected'] = (df['words'].str.split().apply(lambda x: OrderedDict.fromkeys(x).keys()).str.join(' '))

I’ll be very grateful if somebody can help me

Asked By: Sebastian R

||

Source

Answer 1

if words are string "word1 word2":

df['expected'] = [" ".join(set(wrds.strip().split())) for wrds in df.words]

Answered By: dermen

Answer 2

If you don’t need to retain the original order of the words, you can create an intermediate set which will remove duplicates.

df["expected"] = df["words"].str.split().apply(set).str.join(" ")

Answered By: tdelaney

Answer 3

If order is important use dict.fromkeys in a list comprehension:

df['expected'] = [' '.join(dict.fromkeys(w.split())) for w in df['words']]

output:

                words            expected
0  car apple car good      car apple good
1  good bad well good       good bad well
2  car apple bus food  car apple bus food

Answered By: mozway

Remove duplicate words in the same cell within a column in python

Question:

Answers: