Remove duplicate words in the same cell within a column in python
Question:
i need somebody’s help, i have a column with words, i want to remove the duplicated words inside each cell
what i want to get is something like this
words
expected
car apple car good
car apple good
good bad well good
good bad well
car apple bus food
car apple bus food
i’ve tried this but is not working
from collections import OrderedDict
df['expected'] = (df['words'].str.split().apply(lambda x: OrderedDict.fromkeys(x).keys()).str.join(' '))
I’ll be very grateful if somebody can help me
Answers:
if words are string "word1 word2":
df['expected'] = [" ".join(set(wrds.strip().split())) for wrds in df.words]
If you don’t need to retain the original order of the words, you can create an intermediate set which will remove duplicates.
df["expected"] = df["words"].str.split().apply(set).str.join(" ")
If order is important use dict.fromkeys
in a list comprehension:
df['expected'] = [' '.join(dict.fromkeys(w.split())) for w in df['words']]
output:
words expected
0 car apple car good car apple good
1 good bad well good good bad well
2 car apple bus food car apple bus food
i need somebody’s help, i have a column with words, i want to remove the duplicated words inside each cell
what i want to get is something like this
words | expected |
---|---|
car apple car good | car apple good |
good bad well good | good bad well |
car apple bus food | car apple bus food |
i’ve tried this but is not working
from collections import OrderedDict
df['expected'] = (df['words'].str.split().apply(lambda x: OrderedDict.fromkeys(x).keys()).str.join(' '))
I’ll be very grateful if somebody can help me
if words are string "word1 word2":
df['expected'] = [" ".join(set(wrds.strip().split())) for wrds in df.words]
If you don’t need to retain the original order of the words, you can create an intermediate set which will remove duplicates.
df["expected"] = df["words"].str.split().apply(set).str.join(" ")
If order is important use dict.fromkeys
in a list comprehension:
df['expected'] = [' '.join(dict.fromkeys(w.split())) for w in df['words']]
output:
words expected
0 car apple car good car apple good
1 good bad well good good bad well
2 car apple bus food car apple bus food