How to check if a line has one of the strings in a list?
Question:
Possible Duplicate:
Check if multiple strings exist in another string
I am trying to find out if there is a nice and clean way to test for 3 different strings.
Basically I am looping trough a file using a for
loop; then I have to check if it contains 1 of the 3 strings that I have set in a list.
So far I have found the multiple if condition check, but it does not feel like is really elegant and efficient:
for line in file
if "string1" in line or "string2" in line or "string3" in line:
print "found the string"
I was thinking like creating a list that contains string1
, string2
and string3
, and check if any of these is contained in the line, but it doesn’t seems that i can just compare the list without explicitly loop trough the list, and in that case I am basically in the same conditions as in the multiple if statement that I wrote above.
Is there any smart way to check against multiple strings without writing long if statements or loop trough the elements of a list?
Answers:
strings = ("string1", "string2", "string3")
for line in file:
if any(s in line for s in strings):
print "yay!"
This still loops through the cartesian product of the two lists, but it does it one line:
>>> lines1 = ['soup', 'butter', 'venison']
>>> lines2 = ['prune', 'rye', 'turkey']
>>> search_strings = ['a', 'b', 'c']
>>> any(s in l for l in lines1 for s in search_strings)
True
>>> any(s in l for l in lines2 for s in search_strings)
False
This also have the advantage that any
short-circuits, and so the looping stops as soon as a match is found. Also, this only finds the first occurrence of a string from search_strings
in linesX
. If you want to find multiple occurrences you could do something like this:
>>> lines3 = ['corn', 'butter', 'apples']
>>> [(s, l) for l in lines3 for s in search_strings if s in l]
[('c', 'corn'), ('b', 'butter'), ('a', 'apples')]
If you feel like coding something more complex, it seems the Aho-Corasick algorithm can test for the presence of multiple substrings in a given input string. (Thanks to Niklas B. for pointing that out.) I still think it would result in quadratic performance for your use-case since you’ll still have to call it multiple times to search multiple lines. However, it would beat the above (cubic, on average) algorithm.
One approach is to combine the search strings into a regex pattern as in this answer.
Possible Duplicate:
Check if multiple strings exist in another string
I am trying to find out if there is a nice and clean way to test for 3 different strings.
Basically I am looping trough a file using a for
loop; then I have to check if it contains 1 of the 3 strings that I have set in a list.
So far I have found the multiple if condition check, but it does not feel like is really elegant and efficient:
for line in file
if "string1" in line or "string2" in line or "string3" in line:
print "found the string"
I was thinking like creating a list that contains string1
, string2
and string3
, and check if any of these is contained in the line, but it doesn’t seems that i can just compare the list without explicitly loop trough the list, and in that case I am basically in the same conditions as in the multiple if statement that I wrote above.
Is there any smart way to check against multiple strings without writing long if statements or loop trough the elements of a list?
strings = ("string1", "string2", "string3")
for line in file:
if any(s in line for s in strings):
print "yay!"
This still loops through the cartesian product of the two lists, but it does it one line:
>>> lines1 = ['soup', 'butter', 'venison']
>>> lines2 = ['prune', 'rye', 'turkey']
>>> search_strings = ['a', 'b', 'c']
>>> any(s in l for l in lines1 for s in search_strings)
True
>>> any(s in l for l in lines2 for s in search_strings)
False
This also have the advantage that any
short-circuits, and so the looping stops as soon as a match is found. Also, this only finds the first occurrence of a string from search_strings
in linesX
. If you want to find multiple occurrences you could do something like this:
>>> lines3 = ['corn', 'butter', 'apples']
>>> [(s, l) for l in lines3 for s in search_strings if s in l]
[('c', 'corn'), ('b', 'butter'), ('a', 'apples')]
If you feel like coding something more complex, it seems the Aho-Corasick algorithm can test for the presence of multiple substrings in a given input string. (Thanks to Niklas B. for pointing that out.) I still think it would result in quadratic performance for your use-case since you’ll still have to call it multiple times to search multiple lines. However, it would beat the above (cubic, on average) algorithm.
One approach is to combine the search strings into a regex pattern as in this answer.