How to remove repeated sentences from a string
Question:
I have an issue that I do not know how to tackle.
For example: I have a string returning in a function that has multiple sentences separatade by a comma. And some of them are comming repeated:
Like:
"lorem ipsum dolor, lorem ipsum dolor, lorem ipsum dolor"
I need to remove these sentences that are comming repeated but without checking word-by-word, rather sentence by sentence striped by ",". Since there may have other sentences with repeated words that should not be removed.
Input example:
"lorem ipsum dolor, lorem ipsum dolor, lorem mark dol"
Output desired:
"lorem ipsum dolor, lorem mark dol"
Answers:
This solution is based on the Tim Roberts comment. The only difference is OrderedDict
usage in order to preserve sentences order:
from collections import OrderedDict
string = 'lorem ipsum dolor, lorem ipsum dolor, lorem mark dol'
string = ', '.join(OrderedDict.fromkeys(string.split(', ')))
print(string)
Output:
lorem ipsum dolor, lorem mark dol
This solution is not based on Tim Roberts comment but utilizes the same tools:
text = "lorem ipsum dolor, lorem ipsum dolor, lorem mark dol"
text = ', '.join(set(list(map(lambda s: s.strip(), text.split(",")))))
The difference with Alderven’s answer is no imports.
since python 3.6 the dict class keeps the items ordered. so we can also use regular dict, no additional modul is required.
the code splits by ‘, ‘ and also strips off all leading or trailing whitespaces.
txt = "lorem ipsum dolor, lorem ipsum dolor , lorem mark dol"
my_dict = dict.fromkeys(map(str.strip, txt.split(',')))
print(*my_dict, sep=', ')
result is:
lorem ipsum dolor, lorem mark dol
I have an issue that I do not know how to tackle.
For example: I have a string returning in a function that has multiple sentences separatade by a comma. And some of them are comming repeated:
Like:
"lorem ipsum dolor, lorem ipsum dolor, lorem ipsum dolor"
I need to remove these sentences that are comming repeated but without checking word-by-word, rather sentence by sentence striped by ",". Since there may have other sentences with repeated words that should not be removed.
Input example:
"lorem ipsum dolor, lorem ipsum dolor, lorem mark dol"
Output desired:
"lorem ipsum dolor, lorem mark dol"
This solution is based on the Tim Roberts comment. The only difference is OrderedDict
usage in order to preserve sentences order:
from collections import OrderedDict
string = 'lorem ipsum dolor, lorem ipsum dolor, lorem mark dol'
string = ', '.join(OrderedDict.fromkeys(string.split(', ')))
print(string)
Output:
lorem ipsum dolor, lorem mark dol
This solution is not based on Tim Roberts comment but utilizes the same tools:
text = "lorem ipsum dolor, lorem ipsum dolor, lorem mark dol"
text = ', '.join(set(list(map(lambda s: s.strip(), text.split(",")))))
The difference with Alderven’s answer is no imports.
since python 3.6 the dict class keeps the items ordered. so we can also use regular dict, no additional modul is required.
the code splits by ‘, ‘ and also strips off all leading or trailing whitespaces.
txt = "lorem ipsum dolor, lorem ipsum dolor , lorem mark dol"
my_dict = dict.fromkeys(map(str.strip, txt.split(',')))
print(*my_dict, sep=', ')
result is:
lorem ipsum dolor, lorem mark dol