How to check if a line has one of the strings in a list?


Possible Duplicate:
Check if multiple strings exist in another string

I am trying to find out if there is a nice and clean way to test for 3 different strings.

Basically I am looping trough a file using a for loop; then I have to check if it contains 1 of the 3 strings that I have set in a list.

So far I have found the multiple if condition check, but it does not feel like is really elegant and efficient:

for line in file
    if "string1" in line or "string2" in line or "string3" in line:
        print "found the string"

I was thinking like creating a list that contains string1, string2 and string3, and check if any of these is contained in the line, but it doesn’t seems that i can just compare the list without explicitly loop trough the list, and in that case I am basically in the same conditions as in the multiple if statement that I wrote above.

Is there any smart way to check against multiple strings without writing long if statements or loop trough the elements of a list?

Asked By: user1006198



strings = ("string1", "string2", "string3")
for line in file:
    if any(s in line for s in strings):
        print "yay!"
Answered By: Niklas B.

This still loops through the cartesian product of the two lists, but it does it one line:

>>> lines1 = ['soup', 'butter', 'venison']
>>> lines2 = ['prune', 'rye', 'turkey']
>>> search_strings = ['a', 'b', 'c']
>>> any(s in l for l in lines1 for s in search_strings)
>>> any(s in l for l in lines2 for s in search_strings)

This also have the advantage that any short-circuits, and so the looping stops as soon as a match is found. Also, this only finds the first occurrence of a string from search_strings in linesX. If you want to find multiple occurrences you could do something like this:

>>> lines3 = ['corn', 'butter', 'apples']
>>> [(s, l) for l in lines3 for s in search_strings if s in l]
[('c', 'corn'), ('b', 'butter'), ('a', 'apples')]

If you feel like coding something more complex, it seems the Aho-Corasick algorithm can test for the presence of multiple substrings in a given input string. (Thanks to Niklas B. for pointing that out.) I still think it would result in quadratic performance for your use-case since you’ll still have to call it multiple times to search multiple lines. However, it would beat the above (cubic, on average) algorithm.

Answered By: senderle

One approach is to combine the search strings into a regex pattern as in this answer.

Answered By: Janne Karila
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.