Remove first occurrence of word in string
Question:
test = 'User Key Account Department Account Start Date'
I want to remove duplicate words from strings. The solution from this question functions well…
def unique_list(l):
ulist = []
[ulist.append(x) for x in l if x not in ulist]
return ulist
test = ' '.join(unique_list(test.split()))
But it only keeps the subsequent duplicates. I want to remove the first occurrence within the string such that the test string reads "User Key Department Account Start Date".
Answers:
put all element in to a set.
tokenize your sentence into strings and insert into a set.
set<std::string> s;
s.insert("aa");
s.insert("bb");
s.insert("cc");
s.insert("cc");
s.insert("dd");
This should do the job:
test = 'User Key Account Department Account Start Date'
words = test.split()
# if word doesn't exist in the rest of the word list, add it
test = ' '.join([word for i, word in enumerate(words) if word not in words[i+1:]])
print(test) # User Key Department Account Start Date
If you want to keep just the last occurrence of each word then just start from the back and work your way forward.
tokens = test.split()
final = []
for word in tokens[::-1]:
if word in final:
continue
else:
final.append(word)
print(" ".join(final[::-1]))
>> 'User Key Department Account Start Date'
Here is one way to do it:
l=test.split()
m=set([i for i in l if test.count(i)>1])
for i in m:
l.remove(i)
res = ' '.join(l)
>>> print(res)
'User Key Department Account Start Date'
You can convert the source string to a list, and then reverse the list before using the unique_list
function, and then reverse the list again before converting back into a string.
def unique_list(l):
ulist = []
[ulist.append(x) for x in l if x not in ulist]
return ulist
orig="User Key Account Department Account Start Date"
orig_list=orig.split()
orig_list.reverse()
uniq_rev=unique_list(orig_list)
uniq_rev.reverse()
print(orig)
print(' '.join(uniq_rev))
Example:
$ python rev.py
User Key Account Department Account Start Date
User Key Department Account Start Date
If you like it functional:
from functools import reduce
from collections import Counter
import re
if __name__ == '__main__':
sentence = 'User Key Account Department Account Start Date'
result = reduce(
lambda sentence, word: re.sub(rf'{word}s*', '', sentence, count=1),
map(
lambda item: item[0],
filter(
lambda item: item[1] > 1,
Counter(sentence.split()).items()
)
),
sentence
)
print(result)
# User Key Department Account Start Date
test = 'User Key Account Department Account Start Date'
I want to remove duplicate words from strings. The solution from this question functions well…
def unique_list(l):
ulist = []
[ulist.append(x) for x in l if x not in ulist]
return ulist
test = ' '.join(unique_list(test.split()))
But it only keeps the subsequent duplicates. I want to remove the first occurrence within the string such that the test string reads "User Key Department Account Start Date".
put all element in to a set.
tokenize your sentence into strings and insert into a set.
set<std::string> s;
s.insert("aa");
s.insert("bb");
s.insert("cc");
s.insert("cc");
s.insert("dd");
This should do the job:
test = 'User Key Account Department Account Start Date'
words = test.split()
# if word doesn't exist in the rest of the word list, add it
test = ' '.join([word for i, word in enumerate(words) if word not in words[i+1:]])
print(test) # User Key Department Account Start Date
If you want to keep just the last occurrence of each word then just start from the back and work your way forward.
tokens = test.split()
final = []
for word in tokens[::-1]:
if word in final:
continue
else:
final.append(word)
print(" ".join(final[::-1]))
>> 'User Key Department Account Start Date'
Here is one way to do it:
l=test.split()
m=set([i for i in l if test.count(i)>1])
for i in m:
l.remove(i)
res = ' '.join(l)
>>> print(res)
'User Key Department Account Start Date'
You can convert the source string to a list, and then reverse the list before using the unique_list
function, and then reverse the list again before converting back into a string.
def unique_list(l):
ulist = []
[ulist.append(x) for x in l if x not in ulist]
return ulist
orig="User Key Account Department Account Start Date"
orig_list=orig.split()
orig_list.reverse()
uniq_rev=unique_list(orig_list)
uniq_rev.reverse()
print(orig)
print(' '.join(uniq_rev))
Example:
$ python rev.py
User Key Account Department Account Start Date
User Key Department Account Start Date
If you like it functional:
from functools import reduce
from collections import Counter
import re
if __name__ == '__main__':
sentence = 'User Key Account Department Account Start Date'
result = reduce(
lambda sentence, word: re.sub(rf'{word}s*', '', sentence, count=1),
map(
lambda item: item[0],
filter(
lambda item: item[1] > 1,
Counter(sentence.split()).items()
)
),
sentence
)
print(result)
# User Key Department Account Start Date