Regex – Exclude pattern if a certain word appears after the desired word

Question:

I want my regex to match the appearance of a certain word, except if it is followed by another specific word.

More specifically, I would like it to match "union" (in the sense of union or loyalty to a group, so it would not include words like "reunion", i.e. with word boundaries at the beginning and end of the string) in all cases, except when the string says "union europea" (which is understood as an administration and does not appeal to a group in the same way).

Using the pattern unionb does not help, because it would also match the aforementioned sentence.

Answers:

You can use a negative lookahead:

pattern = 'W(union)W(?!europea)'

As pointed out by @Michael Ruth, you probably don’t want to capture words other than union. So, with some test data:

unionize
union 
union europea
reunion 

This pattern only captures union in the second case, (ie., it does not capture reunion or unionize. The W are non-word characters, so additional letters (like from reunion and unionize) are not captured.

Answered By: Kraigolas

Use

pattern = r'bunionb(?!W*europea)'

(?!W*europea) excludes matches where union is followed with nonword characters (if any) and then europea string.

Answered By: Ryszard Czech
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.