How to invert the regular expression group capture logic?
Question:
To create a capturing group in a regex you use (match)
and you prefix it with ?:
to make it non-capturing, like (?:match)
. The thing is, in any kind of complicated regular expression I find myself wanting to create far more non-capturing groups than capturing ones, so I’d like to reverse this logic and only capture groups beginning with ?:
(or whatever). How can I do this? I mainly use regular expressions with .NET, but I wouldn’t mind answers for other languages with regular expressions like Perl, PHP, Python, JavaScript, etc.
Answers:
If you want to avoid the clumsiness of (?: )
and turn ( )
groups into non-capturing groups, use the RegexOptions.ExplicitCapture
option. Only named groups ((?<name>subexpression)
) will be captured if this option is being used.
However, you cannot turn non-capturing groups (?: )
into capturing groups, unfortunately.
The RegEx constructor as well as other methods from the RegEx class accept RegexOptions
flags.
For example:
Regex.Matches(input, pattern, RegexOptions.ExplicitCapture)
In any language that supports named capture groups you can simply use them for what you want captured, and ignore the numbered ones.
my $string = q(Available from v5.10 in Perl.);
$string =~ /([A-Z].+?)(?<ver>[0-9.]+)s+(.*?)./;
say "Version: $+{ver}";
After the regex the capture is in %+
hash, while inside the regex it’s k<name>
or g{name}
.
The downside is that you still capture all that other stuff (what hurts efficiency a little), while the upside is that you still capture all that other stuff (what helps flexibility, if some turns out needed).
To create a capturing group in a regex you use (match)
and you prefix it with ?:
to make it non-capturing, like (?:match)
. The thing is, in any kind of complicated regular expression I find myself wanting to create far more non-capturing groups than capturing ones, so I’d like to reverse this logic and only capture groups beginning with ?:
(or whatever). How can I do this? I mainly use regular expressions with .NET, but I wouldn’t mind answers for other languages with regular expressions like Perl, PHP, Python, JavaScript, etc.
If you want to avoid the clumsiness of (?: )
and turn ( )
groups into non-capturing groups, use the RegexOptions.ExplicitCapture
option. Only named groups ((?<name>subexpression)
) will be captured if this option is being used.
However, you cannot turn non-capturing groups (?: )
into capturing groups, unfortunately.
The RegEx constructor as well as other methods from the RegEx class accept RegexOptions
flags.
For example:
Regex.Matches(input, pattern, RegexOptions.ExplicitCapture)
In any language that supports named capture groups you can simply use them for what you want captured, and ignore the numbered ones.
my $string = q(Available from v5.10 in Perl.);
$string =~ /([A-Z].+?)(?<ver>[0-9.]+)s+(.*?)./;
say "Version: $+{ver}";
After the regex the capture is in %+
hash, while inside the regex it’s k<name>
or g{name}
.
The downside is that you still capture all that other stuff (what hurts efficiency a little), while the upside is that you still capture all that other stuff (what helps flexibility, if some turns out needed).