Regex lookbehind with "or" operator causes an error in Python: "look-behind requires fixed-width pattern"

Question:

The following regex should match any [Ss]tring – except when it preceeded by complexstring[a-z]s or anothercomplexstring[a-z]s:

match = r'(?<!complexstring[a-z]s|anothercomplexstring[a-z]s)[Ss]tring'

Unfortunately, I get an error if I do:

re.compile(match).findall(text_file)

Without the |anothercomplexstring[a-z]s the error goes away. My question: How do I use back referencing with an "or" operator in Python?

Asked By: trashjazz

||

Answers:

The error message explains what’s wrong: Python’s regex engine can only handle look-behind patterns that have exactly one possible width. In your pattern, the alternation (using |) allows at least two different lengths, so it isn’t supported.

There is an easy fix: You can repeat the look-behind for both of the alternatives (though since you’re doing negative lookbehinds, De Morgan’s laws say you need to require both, since not (a or b) is (not a) and (not b), so don’t use | any more):

match = r'(?<!complexstring[a-z]s)(?<!anothercomplexstring[a-z]s)[Ss]tring'
Answered By: Blckknght
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.