re.sub replace with matched content

Question:

Trying to get to grips with regular expressions in Python, I’m trying to output some HTML highlighted in part of a URL. My input is

images/:id/size

my output should be

images/<span>:id</span>/size

If I do this in Javascript

method = 'images/:id/size';
method = method.replace(/:([a-z]+)/, '<span>$1</span>')
alert(method)

I get the desired result, but if I do this in Python

>>> method = 'images/:id/huge'
>>> re.sub(':([a-z]+)', '<span>$1</span>', method)
'images/<span>$1</span>/huge'

I don’t, how do I get Python to return the correct result rather than $1? Is re.sub even the right function to do this?

Asked By: Smudge

||

Answers:

Use 1 instead of $1.

number Matches the contents of the group of the same number.

http://docs.python.org/library/re.html#regular-expression-syntax

Answered By: user647772

Simply use 1 instead of $1:

In [1]: import re

In [2]: method = 'images/:id/huge'

In [3]: re.sub(r'(:[a-z]+)', r'<span>1</span>', method)
Out[3]: 'images/<span>:id</span>/huge'

Also note the use of raw strings (r'...') for regular expressions. It is not mandatory but removes the need to escape backslashes, arguably making the code slightly more readable.

Answered By: NPE

For the replacement portion, Python uses 1 the way sed and vi do, not $1 the way Perl, Java, and Javascript (amongst others) do. Furthermore, because 1 interpolates in regular strings as the character U+0001, you need to use a raw string or escape it.

Python 3.2 (r32:88445, Jul 27 2011, 13:41:33) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> method = 'images/:id/huge'
>>> import re
>>> re.sub(':([a-z]+)', r'<span>1</span>', method)
'images/<span>id</span>/huge'
>>> 
Answered By: tchrist

A backreference to the whole match value is g<0>, see re.sub documentation:

The backreference g<0> substitutes in the entire substring matched by the RE.

See the Python demo:

import re
method = 'images/:id/huge'
print(re.sub(r':[a-z]+', r'<span>g<0></span>', method))
# => images/<span>:id</span>/huge

If you need to perform a case insensitive search, add flag=re.I:

re.sub(r':[a-z]+', r'<span>g<0></span>', method, flags=re.I)
Answered By: Wiktor Stribiżew