Reuse named group across multiple patterns in Python involving `|` operator with single compilation
Question:
I plan to match the string(one line) with any one of the patterns.
Pattern1: fname lname
Pattern2: lname,fname
Example String(s):
Frank Delo
Delo,Frank
groupdict()
Output should return the same for both the strings
{"fname":"Frank",
"lname":"Delo"
}
Here’s what I tried
r1 = "^(?P<fname>[a-zA-Z]+)(?: (?P<lname>[a-zA-Z]+))?$"
r2 = "^(?P=lname),(?P=fname)$"
print(re.match("|".join([r1,r2]), "Frank Delo").groupdict()) # Works fine
print(re.match("|".join([r1,r2]), "Delo,Frank").groupdict()) # Doesn't match
Can we not use named group references after ‘|’ operator?
Also, please note that I don’t want to compile the patterns seperately
Answers:
There are two issues:
-
(?P=lname)
is a backreference, which means it matches whatever (?P<lname>)
matched, which is not what you want, as this is intended to cover the case where r1
did not match at all.
-
To fix the above, you’d want to use (?P<lname>)
again, so that whichever alternative regex applies (either r1
or r2
), you’d define that named group. However re
does not support that. The good news is that the more rich regex
package does support it.
So then we get:
import regex as re
r1 = "^(?P<fname>[a-zA-Z]+) (?P<lname>[a-zA-Z]+)$"
r2 = "^(?P<lname>[a-zA-Z]+),(?P<fname>[a-zA-Z]+)$"
r = "|".join([r1,r2])
print(re.match(r, "Frank Delo").groupdict()) # Works fine
print(re.match(r, "Delo,Frank").groupdict()) # Works fine too
I plan to match the string(one line) with any one of the patterns.
Pattern1: fname lname
Pattern2: lname,fname
Example String(s):
Frank Delo
Delo,Frank
groupdict()
Output should return the same for both the strings
{"fname":"Frank",
"lname":"Delo"
}
Here’s what I tried
r1 = "^(?P<fname>[a-zA-Z]+)(?: (?P<lname>[a-zA-Z]+))?$"
r2 = "^(?P=lname),(?P=fname)$"
print(re.match("|".join([r1,r2]), "Frank Delo").groupdict()) # Works fine
print(re.match("|".join([r1,r2]), "Delo,Frank").groupdict()) # Doesn't match
Can we not use named group references after ‘|’ operator?
Also, please note that I don’t want to compile the patterns seperately
There are two issues:
-
(?P=lname)
is a backreference, which means it matches whatever(?P<lname>)
matched, which is not what you want, as this is intended to cover the case wherer1
did not match at all. -
To fix the above, you’d want to use
(?P<lname>)
again, so that whichever alternative regex applies (eitherr1
orr2
), you’d define that named group. Howeverre
does not support that. The good news is that the more richregex
package does support it.
So then we get:
import regex as re
r1 = "^(?P<fname>[a-zA-Z]+) (?P<lname>[a-zA-Z]+)$"
r2 = "^(?P<lname>[a-zA-Z]+),(?P<fname>[a-zA-Z]+)$"
r = "|".join([r1,r2])
print(re.match(r, "Frank Delo").groupdict()) # Works fine
print(re.match(r, "Delo,Frank").groupdict()) # Works fine too