Remove items out of a python tuple
Question:
I have a list of tuples with word frequencies and a list of words to eliminate. How to avoid loops and delete tuples from a list?
data = [('the',23),('for',15),('so',10),('micro',10),('if',10),('macro',10)]
words = ['so','is','for','if'] # unique
indice =[]
# %%
for ii in range(len(data)):
for jj in range(len(words)):
if words[jj]==data[ii][0]:
print(words[jj]+ ': found')
indice.append(ii)
# del data[indice] # doesn't work
# data.remove(indice) # doesn't work
Answers:
I would transform the word list to a set for faster lookups, and then use a list comprehension:
wordset = set(wordset)
[item for item in data if item[0] not in wordset]
This outputs:
[('the', 23), ('micro', 10), ('macro', 10)]
This is precisely what the built-in filter() function is useful for.
No explicit loop here.
e.g.,
data = [('the',23),('for',15),('so',10),('micro',10),('if',10),('macro',10)]
words = ['so','is','for','if']
indice = list(filter(lambda x: x[0] not in words, data))
print(indice)
Output:
[('the', 23), ('micro', 10), ('macro', 10)]
As has been pointed out in an earlier comment, words should be a set for enhanced performance
I have a list of tuples with word frequencies and a list of words to eliminate. How to avoid loops and delete tuples from a list?
data = [('the',23),('for',15),('so',10),('micro',10),('if',10),('macro',10)]
words = ['so','is','for','if'] # unique
indice =[]
# %%
for ii in range(len(data)):
for jj in range(len(words)):
if words[jj]==data[ii][0]:
print(words[jj]+ ': found')
indice.append(ii)
# del data[indice] # doesn't work
# data.remove(indice) # doesn't work
I would transform the word list to a set for faster lookups, and then use a list comprehension:
wordset = set(wordset)
[item for item in data if item[0] not in wordset]
This outputs:
[('the', 23), ('micro', 10), ('macro', 10)]
This is precisely what the built-in filter() function is useful for.
No explicit loop here.
e.g.,
data = [('the',23),('for',15),('so',10),('micro',10),('if',10),('macro',10)]
words = ['so','is','for','if']
indice = list(filter(lambda x: x[0] not in words, data))
print(indice)
Output:
[('the', 23), ('micro', 10), ('macro', 10)]
As has been pointed out in an earlier comment, words should be a set for enhanced performance