Reuse named group across multiple patterns in Python involving `|` operator with single compilation

Question:

I plan to match the string(one line) with any one of the patterns.

Pattern1: fname lname

Pattern2: lname,fname

Example String(s):

Frank Delo
Delo,Frank

groupdict() Output should return the same for both the strings

{"fname":"Frank",
 "lname":"Delo"
}

Here’s what I tried

r1 = "^(?P<fname>[a-zA-Z]+)(?: (?P<lname>[a-zA-Z]+))?$"
r2 = "^(?P=lname),(?P=fname)$"

print(re.match("|".join([r1,r2]), "Frank Delo").groupdict()) # Works fine
print(re.match("|".join([r1,r2]), "Delo,Frank").groupdict()) # Doesn't match

Can we not use named group references after ‘|’ operator?

Also, please note that I don’t want to compile the patterns seperately

Asked By: Abhishek J

||

Answers:

There are two issues:

  • (?P=lname) is a backreference, which means it matches whatever (?P<lname>) matched, which is not what you want, as this is intended to cover the case where r1 did not match at all.

  • To fix the above, you’d want to use (?P<lname>) again, so that whichever alternative regex applies (either r1 or r2), you’d define that named group. However re does not support that. The good news is that the more rich regex package does support it.

So then we get:

import regex as re

r1 = "^(?P<fname>[a-zA-Z]+) (?P<lname>[a-zA-Z]+)$"
r2 = "^(?P<lname>[a-zA-Z]+),(?P<fname>[a-zA-Z]+)$"

r = "|".join([r1,r2])

print(re.match(r, "Frank Delo").groupdict()) # Works fine
print(re.match(r, "Delo,Frank").groupdict()) # Works fine too
Answered By: trincot
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.