Alternative to the `match = re.match(); if match: …` idiom?

Question:

If you want to check if something matches a regex, if so, print the first group, you do..

import re
match = re.match("(d+)g", "123g")
if match is not None:
    print match.group(1)

This is completely pedantic, but the intermediate match variable is a bit annoying..

Languages like Perl do this by creating new $1..$9 variables for match groups, like..

if($blah ~= /(d+)g/){
    print $1
}

From this reddit comment,

with re_context.match('^blah', s) as match:
    if match:
        ...
    else:
        ...

..which I thought was an interesting idea, so I wrote a simple implementation of it:

#!/usr/bin/env python2.6
import re

class SRE_Match_Wrapper:
    def __init__(self, match):
        self.match = match

    def __exit__(self, type, value, tb):
        pass

    def __enter__(self):
        return self.match

    def __getattr__(self, name):
        if name == "__exit__":
            return self.__exit__
        elif name == "__enter__":
            return self.__name__
        else:
            return getattr(self.match, name)

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    x = SRE_Match_Wrapper(matcher.match(inp))
    return x
    return match

if __name__ == '__main__':
    # Example:
    with rematch("(d+)g", "123g") as m:
        if m:
            print(m.group(1))

    with rematch("(d+)g", "123") as m:
        if m:
            print(m.group(1))

(This functionality could theoretically be patched into the _sre.SRE_Match object)

It would be nice if you could skip the execution of the with statement’s code block, if there was no match, which would simplify this to..

with rematch("(d+)g", "123") as m:
    print(m.group(1)) # only executed if the match occurred

..but this seems impossible based of what I can deduce from PEP 343

Any ideas? As I said, this is really trivial annoyance, almost to the point of being code-golf..

Asked By: dbr

||

Answers:

I don’t think using with is the solution in this case. You’d have to raise an exception in the BLOCK part (which is specified by the user) and have the __exit__ method return True to “swallow” the exception. So it would never look good.

I’d suggest going for a syntax similar to the Perl syntax. Make your own extended re module (I’ll call it rex) and have it set variables in its module namespace:

if rex.match('(d+)g', '123g'):
    print rex._1

As you can see in the comments below, this method is neither scope- nor thread-safe. You would only use this if you were completely certain that your application wouldn’t become multi-threaded in the future and that any functions called from the scope that you’re using this in will also use the same method.

Answered By: Blixt

I don’t think it’s trivial. I don’t want to have to sprinkle a redundant conditional around my code if I’m writing code like that often.

This is slightly odd, but you can do this with an iterator:

import re

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    matches = matcher.match(inp)
    if matches:
        yield matches

if __name__ == '__main__':
    for m in rematch("(d+)g", "123g"):
        print(m.group(1))

The odd thing is that it’s using an iterator for something that isn’t iterating–it’s closer to a conditional, and at first glance it might look like it’s going to yield multiple results for each match.

It does seem odd that a context manager can’t cause its managed function to be skipped entirely; while that’s not explicitly one of the use cases of “with”, it seems like a natural extension.

Answered By: Glenn Maynard

If you’re doing a lot of these in one place, here’s an alternative answer:

import re
class Matcher(object):
    def __init__(self):
        self.matches = None
    def set(self, matches):
        self.matches = matches
    def __getattr__(self, name):
        return getattr(self.matches, name)

class re2(object):
    def __init__(self, expr):
        self.re = re.compile(expr)

    def match(self, matcher, s):
        matches = self.re.match(s)
        matcher.set(matches)
        return matches

pattern = re2("(d+)g")
m = Matcher()
if pattern.match(m, "123g"):
    print(m.group(1))
if not pattern.match(m, "x123g"):
    print "no match"

You can compile the regex once with the same thread safety as re, create a single reusable Matcher object for the whole function, and then you can use it very concisely. This also has the benefit that you can reverse it in the obvious way–to do that with an iterator, you’d need to pass a flag to tell it to invert its result.

It’s not much help if you’re only doing a single match per function, though; you don’t want to keep Matcher objects in a broader context than that; it’d cause the same issues as Blixt’s solution.

Answered By: Glenn Maynard

This is not really pretty-looking, but you can profit from the getattr(object, name[, default]) built-in function using it like this:

>>> getattr(re.match("(d+)g", "123g"), 'group', lambda n:'')(1)
'123'
>>> getattr(re.match("(d+)g", "X23g"), 'group', lambda n:'')(1)
''

To mimic the if match print group flow, you can (ab)use the for statement this way:

>>> for group in filter(None, [getattr(re.match("(d+)g", "123g"), 'group', None)]):
        print(group(1))
123
>>> for group in filter(None, [getattr(re.match("(d+)g", "X23g"), 'group', None)]):
        print(group(1))
>>> 

Of course you can define a little function to do the dirty work:

>>> matchgroup = lambda p,s: filter(None, [getattr(re.match(p, s), 'group', None)])
>>> for group in matchgroup("(d+)g", "123g"):
        print(group(1))
123
>>> for group in matchgroup("(d+)g", "X23g"):
        print(group(1))
>>> 
Answered By: etuardu

Not the perfect solution, but does allow you to chain several match options for the same str:

class MatchWrapper(object):
  def __init__(self):
    self._matcher = None

  def wrap(self, matcher):
    self._matcher = matcher

  def __getattr__(self, attr):
    return getattr(self._matcher, attr)

def match(pattern, s, matcher):
  m = re.match(pattern, s)
  if m:
    matcher.wrap(m)
    return True
  else:
    return False

matcher = MatchWrapper()
s = "123g";
if _match("(d+)g", line, matcher):
  print matcher.group(1)
elif _match("(w+)g", line, matcher):
  print matcher.group(1)
else:
  print "no match"
Answered By: oneself

Another nice syntax would be something like this:

header = re.compile('(.*?) = (.*?)$')
footer = re.compile('(.*?): (.*?)$')

if header.match(line) as m:
    key, value = m.group(1,2)
elif footer.match(line) as m
    key, value = m.group(1,2)
else:
    key, value = None, None
Answered By: mhubig

I have another way of doing this, based on Glen Maynard’s solution:

for match in [m for m in [re.match(pattern,key)] if m]:
    print "It matched: %s" % match

Similar to Glen’s solution, this itterates either 0 (if no match) or 1 (if a match) times.

No sub needed, but less tidy as a result.

Answered By: AMADANON Inc.

Here’s my solution:

import re

s = 'hello world'

match = []
if match.append(re.match('ww+', s)) or any(match):
    print('W:', match.pop().group(0))
elif match.append(re.match('hw+', s)) or any(match):
    print('H:', match.pop().group(0))
else:
    print('No match found')

You can use as many elif clauses as needed.

Even better:

import re

s = 'hello world'

if vars().update(match=re.match('ww+', s)) or match:
    print('W:', match.group(0))
elif vars().update(match=re.match('hw+', s)) or match:
    print('H:', match.group(0))
else:
    print('No match found')

Both append and update return None. So you have to actually check the result of your expression by using the or part in every case.

Unfortunately, this only works as long as the code resides top-level, i.e. not in a function.

Answered By: dennis

This is what I do:

def re_match_cond (match_ref, regex, text):
    match = regex.match (text)
    del match_ref[:]
    match_ref.append (match)
    return match

if __name__ == '__main__':
    match_ref = []
    if re_match_cond (match_ref, regex_1, text):
        match = match_ref[0]
        ### ...
    elif re_match_cond (match_ref, regex_2, text):
        match = match_ref[0]
        ### ...
    elif re_match_cond (match_ref, regex_3, text):
        match = match_ref[0]
        ### ...
    else:
        ### no match
        ### ...

That is, I pass a list to the function to emulate pass-by-reference.

Answered By: Rhubbarb

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can now capture the condition value re.match(r'(d+)g', '123g') in a variable match in order to both check if it’s not None and then re-use it within the body of the condition:

>>> if match := re.match(r'(d+)g', '123g'):
...   print(match.group(1))
... 
123
>>> if match := re.match(r'(d+)g', 'dddf'):
...   print(match.group(1))
...
>>>
Answered By: Xavier Guihot
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.