Regex lookbehind with "or" operator causes an error in Python: "look-behind requires fixed-width pattern"
Question:
The following regex should match any [Ss]tring
– except when it preceeded by complexstring[a-z]s
or anothercomplexstring[a-z]s
:
match = r'(?<!complexstring[a-z]s|anothercomplexstring[a-z]s)[Ss]tring'
Unfortunately, I get an error if I do:
re.compile(match).findall(text_file)
Without the |anothercomplexstring[a-z]s
the error goes away. My question: How do I use back referencing with an "or" operator in Python?
Answers:
The error message explains what’s wrong: Python’s regex engine can only handle look-behind patterns that have exactly one possible width. In your pattern, the alternation (using |
) allows at least two different lengths, so it isn’t supported.
There is an easy fix: You can repeat the look-behind for both of the alternatives (though since you’re doing negative lookbehinds, De Morgan’s laws say you need to require both, since not (a or b)
is (not a) and (not b)
, so don’t use |
any more):
match = r'(?<!complexstring[a-z]s)(?<!anothercomplexstring[a-z]s)[Ss]tring'
The following regex should match any [Ss]tring
– except when it preceeded by complexstring[a-z]s
or anothercomplexstring[a-z]s
:
match = r'(?<!complexstring[a-z]s|anothercomplexstring[a-z]s)[Ss]tring'
Unfortunately, I get an error if I do:
re.compile(match).findall(text_file)
Without the |anothercomplexstring[a-z]s
the error goes away. My question: How do I use back referencing with an "or" operator in Python?
The error message explains what’s wrong: Python’s regex engine can only handle look-behind patterns that have exactly one possible width. In your pattern, the alternation (using |
) allows at least two different lengths, so it isn’t supported.
There is an easy fix: You can repeat the look-behind for both of the alternatives (though since you’re doing negative lookbehinds, De Morgan’s laws say you need to require both, since not (a or b)
is (not a) and (not b)
, so don’t use |
any more):
match = r'(?<!complexstring[a-z]s)(?<!anothercomplexstring[a-z]s)[Ss]tring'