Using a regex as a template with Python

Question:

I have the idea to use a regex pattern as a template and wonder if there is a convenient way to do so in Python (3 or newer).

import re

pattern = re.compile("/something/(?P<id>.*)")
pattern.populate(id=1) # that is what I'm looking for

should result in

/something/1
Asked By: deamon

||

Answers:

that’s not what regex are for, you could just use normal string formatting.

>>> '/something/{id}'.format(id=1)
'/something/1'
Answered By: SilentGhost

Save the compile until after the substitution:

pattern = re.compile("/something/(?P<%s>.*)" % 1)
Answered By: Bryan Oakley

Below is a a light-weight class I created that does what you’re looking for. You can write a single regular expression, and use that expression for both matching strings and generating strings.

There is a small example on the bottom of the code on how to use it.

Generally, you construct a regular expression normally, and use the match and search functions as normal. The format function is used much like string.format to generate a new string.

import re
regex_type = type(re.compile(""))

# This is not perfect. It breaks if there is a parenthesis in the regex.
re_term = re.compile(r"(?<!\)(?P<(?P<name>[w_d]+)>(?P<regex>[^)]*))")

class BadFormatException(Exception):
    pass

class RegexTemplate(object):
    def __init__(self, r, *args, **kwargs):
        self.r = re.compile(r, *args, **kwargs)
    
    def __repr__(self):
        return "<RegexTemplate '%s'>"%self.r.pattern
    
    def match(self, *args, **kwargs):
        '''The regex match function'''
        return self.r.match(*args, **kwargs)
    
    def search(self, *args, **kwargs):
        '''The regex match function'''
        return self.r.search(*args, **kwargs)
    
    def format(self, **kwargs):
        '''Format this regular expression in a similar way as string.format.
        Only supports true keyword replacement, not group replacement.'''
        pattern = self.r.pattern
        def replace(m):
            name = m.group('name')
            reg = m.group('regex')
            val = kwargs[name]
            if not re.match(reg, val):
                raise BadFormatException("Template variable '%s' has a value "
                    "of %s, does not match regex %s."%(name, val, reg))
            return val
        
        # The regex sub function does most of the work
        value = re_term.sub(replace, pattern)
        
        # Now we have un-escape the special characters. 
        return re.sub(r"\([.()[]])", r"1", value)

def compile(*args, **kwargs):
    return RegexTemplate(*args, **kwargs)
    
if __name__ == '__main__':
    # Construct a typical URL routing regular expression
    r = RegexTemplate(r"http://example.com/(?P<year>dddd)/(?P<title>w+)")
    print(r)
    
    # This should match
    print(r.match("http://example.com/2015/article"))
    # Generate the same URL using url formatting.
    print(r.format(year = "2015", title = "article"))
    
    # This should not match
    print(r.match("http://example.com/abcd/article"))
    # This will raise an exception because year is not formatted properly
    try:
        print(r.format(year = "15", title = "article"))
    except BadFormatException as e:
        print(e)
    

There are some limitations:

  • The format function only works with keyword arguments (you can’t use the 1 style formatting as in string.format).
  • There is also a bug with matching elements with sub-elements, e.g., RegexTemplate(r'(?P<foo>biz(baz)?)'). This could be corrected with a bit of work.
  • If your regular expression contains character classes outside of a named group, (e.g., [a-z123]) we will not know how to format those.
Answered By: speedplane

For very simple cases, probably the easiest way to do this is by replacing the named capture groups with format fields.

Here is a basic validator/formatter:

import re
from functools import partial

unescape = partial(re.compile(r'\(.)').sub, r'1')
namedgroup = partial(re.compile(r'(?P<(w+)>.*?)').sub, r'{1}')


class Mould:
    def __init__(self, pattern):
        self.pattern = re.compile(pattern)
        self.template = unescape(namedgroup(pattern))

    def format(self, **values):
        try:
            return self.template.format(**values)
        except KeyError as e:
            raise TypeError(f'Missing argument: {e}') from None

    def search(self, string):
        try:
            return self.pattern.search(string).groupdict()
        except AttributeError:
            raise ValueError(string) from None

So, for example, to instantiate a validator/formatter for phone numbers in the form (XXX) YYY-ZZZZ:

template = r'((?P<area>d{3})) (?P<prefix>d{3})-(?P<line>d{4})'
phonenum = Mould(template)

And then:

>>> phonenum.search('(333) 444-5678')
{'area': '333', 'prefix': '444', 'line': '5678'}

>>> phonenum.format(area=111, prefix=555, line=444)
(111) 555-444

But this is a very basic skeleton that overlooks many regex features (like lookarounds or non-capturing groups, for example). If they are needed, things can get quite messy pretty quickly. In this case, the other way around: generating the pattern from the template, although more verbose, may be more flexible and less error-prone.

Here is the basic validator/formatter (.search() and .format() are the same):

import string
import re

FMT = string.Formatter()


class Mould:
    def __init__(self, template, **kwargs):
        self.template = template
        self.pattern = self.make_pattern(template, **kwargs)

    @staticmethod
    def make_pattern(template, **kwargs):
        pattern = ''
        # for each field in the template, add to the pattern
        for text, field, *_ in FMT.parse(template):
            # the escaped preceding text
            pattern += re.escape(text)
            if field:
                # a named regex capture group
                pattern += f'(?P<{field}>{kwargs[field]})'
            # XXX: if there's text after the last field,
            #   the parser will iterate one more time,
            #   hence the 'if field'
        return re.compile(pattern)

Instantiation:

template = '({area}) {prefix}-{line}'
content  = dict(area=r'd{3}', prefix=r'd{3}', line=r'd{4}')
phonenum = Mould(template, **content)

Execution:

>>> phonenum.search('(333) 444-5678')
{'area': '333', 'prefix': '444', 'line': '5678'}

>>> phonenum.format(area=111, prefix=555, line=444)
(111) 555-444
Answered By: Nuno André

If the regex is just a bunch of named groups joined by some predefined string, you can convert the regex it into a template string like this

from string import Template
def pattern2template(regex, join_string):
    tmpl_str = join_string.join(["$"+x for x in regex.groupindex.keys()])
    # prepend string to match your case
    tmpl_str = join_string + tmpl_str
    return Template(tmpl_str)

In you case this gives:

>>> x = pattern2template(pattern, "/something/")
>>> print(x.template)
/something/$id
>>> print(x.substitute(id="myid"))
/something/myid
Answered By: JuanPi

Googling potential names for a package that does this, I found xereg:

>>> from xeger import Xeger
>>> x = Xeger(limit=10)  # default limit = 10
>>> x.xeger("/json/([0-9]+)")
u'/json/15062213'

It is specifically designed to generate random values (rather than using input values) for the capture groups found in a pattern, but there should be enough overlap in the purpose that bits of the implementation could be reused.

(As a side note, I’ve always thought that there should be a subset of the regular expression grammar that supports this templating use without caveat or hacks. Could be an interesting project.)

Answered By: Seb
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.