Regex (?J) mode modifier in Python Regex or equival ability for named capture group from different patterns
Question:
I am trying to capture from two different pattern sequences using a named capture group. This SO question solves the problem in PCRE using the mode modifier (?J)
, and this SO question solves a related problem in Python that I haven’t succeeded at applying to my use case.
Example test strings:
abc-CAPTUREME-xyz-abcdef
abc-xyz-CAPTUREME-abcdef
Desired output:
CAPTUREME
CAPTUREME
CAPTUREME
appears on either the left or right of the xyz
sequence. My initial failed attempt at a regex looked like this:
r'abc-(xyz-(?P<cap>w+)|(?P<cap>w+)-xyz)-abcdef'
But in Python regexes that yields an error (?P<cap> A subpattern name must be unique)
and python doesn’t support the (?J)
modifier that was used in the first answer above to solve the problem.
With a single capture group I can capture CAPTUREME-xyz
or xyz-CAPTUREME
, but I can’t reproduce the example in the 2nd stack overflow article linked above using lookarounds. Every attempt to replicate the 2nd stack overflow article simply doesn’t match my string and there are too many differences for me to piece together what’s happening.
r'abc-(?P<cap>(xyz-)w+|w+(-xyz))-abcdef'
Answers:
Looking at the second article, you could write the pattern as:
(?P<cap>(?<=abc-xyz-)w+|w+(?=-xyz-abcdef))
Explanation
(?P<cap>
Named group cap
(?<=abc-xyz-)w+
Match 1+ word characters, asserting abc-xyz- to the left
|
Or
w+(?=-xyz-abcdef)
Match 1+ word characters, asserting -xyz-abcdef to the right
)
Close group cap
Another option in Python could be using a conditional and a capture group:
abc-(xyz-)?(?P<cap>w+)-(?(1)|xyz-)abcdef
Explanation
abc-(xyz-)?
Match abc-
and optionally capture xyz-
in group 1
(?P<cap>w+)
Named group cap, match 1+ word characters
-
Match literally
(?(1)|xyz-)
If group 1 is not present, match xyz-
abcdef
Match literally
I am trying to capture from two different pattern sequences using a named capture group. This SO question solves the problem in PCRE using the mode modifier (?J)
, and this SO question solves a related problem in Python that I haven’t succeeded at applying to my use case.
Example test strings:
abc-CAPTUREME-xyz-abcdef
abc-xyz-CAPTUREME-abcdef
Desired output:
CAPTUREME
CAPTUREME
CAPTUREME
appears on either the left or right of the xyz
sequence. My initial failed attempt at a regex looked like this:
r'abc-(xyz-(?P<cap>w+)|(?P<cap>w+)-xyz)-abcdef'
But in Python regexes that yields an error (?P<cap> A subpattern name must be unique)
and python doesn’t support the (?J)
modifier that was used in the first answer above to solve the problem.
With a single capture group I can capture CAPTUREME-xyz
or xyz-CAPTUREME
, but I can’t reproduce the example in the 2nd stack overflow article linked above using lookarounds. Every attempt to replicate the 2nd stack overflow article simply doesn’t match my string and there are too many differences for me to piece together what’s happening.
r'abc-(?P<cap>(xyz-)w+|w+(-xyz))-abcdef'
Looking at the second article, you could write the pattern as:
(?P<cap>(?<=abc-xyz-)w+|w+(?=-xyz-abcdef))
Explanation
(?P<cap>
Named group cap(?<=abc-xyz-)w+
Match 1+ word characters, asserting abc-xyz- to the left|
Orw+(?=-xyz-abcdef)
Match 1+ word characters, asserting -xyz-abcdef to the right
)
Close group cap
Another option in Python could be using a conditional and a capture group:
abc-(xyz-)?(?P<cap>w+)-(?(1)|xyz-)abcdef
Explanation
abc-(xyz-)?
Matchabc-
and optionally capturexyz-
in group 1(?P<cap>w+)
Named group cap, match 1+ word characters-
Match literally(?(1)|xyz-)
If group 1 is not present, matchxyz-
abcdef
Match literally