How to invert the regular expression group capture logic?


To create a capturing group in a regex you use (match) and you prefix it with ?: to make it non-capturing, like (?:match). The thing is, in any kind of complicated regular expression I find myself wanting to create far more non-capturing groups than capturing ones, so I’d like to reverse this logic and only capture groups beginning with ?: (or whatever). How can I do this? I mainly use regular expressions with .NET, but I wouldn’t mind answers for other languages with regular expressions like Perl, PHP, Python, JavaScript, etc.

Asked By: Jez



If you want to avoid the clumsiness of (?: ) and turn ( ) groups into non-capturing groups, use the RegexOptions.ExplicitCapture option. Only named groups ((?<name>subexpression)) will be captured if this option is being used.

However, you cannot turn non-capturing groups (?: ) into capturing groups, unfortunately.

The RegEx constructor as well as other methods from the RegEx class accept RegexOptions flags.

For example:

Regex.Matches(input, pattern, RegexOptions.ExplicitCapture)
Answered By: user2819245

In any language that supports named capture groups you can simply use them for what you want captured, and ignore the numbered ones.

my $string = q(Available from v5.10 in Perl.);

$string =~ /([A-Z].+?)(?<ver>[0-9.]+)s+(.*?)./;

say "Version: $+{ver}";

After the regex the capture is in %+ hash, while inside the regex it’s k<name> or g{name}.

The downside is that you still capture all that other stuff (what hurts efficiency a little), while the upside is that you still capture all that other stuff (what helps flexibility, if some turns out needed).

Answered By: zdim
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.