re.sub replace with matched content
Question:
Trying to get to grips with regular expressions in Python, I’m trying to output some HTML highlighted in part of a URL. My input is
images/:id/size
my output should be
images/<span>:id</span>/size
If I do this in Javascript
method = 'images/:id/size';
method = method.replace(/:([a-z]+)/, '<span>$1</span>')
alert(method)
I get the desired result, but if I do this in Python
>>> method = 'images/:id/huge'
>>> re.sub(':([a-z]+)', '<span>$1</span>', method)
'images/<span>$1</span>/huge'
I don’t, how do I get Python to return the correct result rather than $1
? Is re.sub
even the right function to do this?
Answers:
Use 1
instead of $1
.
number Matches the contents of the group of the same number.
http://docs.python.org/library/re.html#regular-expression-syntax
Simply use 1
instead of $1
:
In [1]: import re
In [2]: method = 'images/:id/huge'
In [3]: re.sub(r'(:[a-z]+)', r'<span>1</span>', method)
Out[3]: 'images/<span>:id</span>/huge'
Also note the use of raw strings (r'...'
) for regular expressions. It is not mandatory but removes the need to escape backslashes, arguably making the code slightly more readable.
For the replacement portion, Python uses 1
the way sed and vi do, not $1
the way Perl, Java, and Javascript (amongst others) do. Furthermore, because 1
interpolates in regular strings as the character U+0001, you need to use a raw string or escape it.
Python 3.2 (r32:88445, Jul 27 2011, 13:41:33)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> method = 'images/:id/huge'
>>> import re
>>> re.sub(':([a-z]+)', r'<span>1</span>', method)
'images/<span>id</span>/huge'
>>>
A backreference to the whole match value is g<0>
, see re.sub
documentation:
The backreference g<0>
substitutes in the entire substring matched by the RE.
See the Python demo:
import re
method = 'images/:id/huge'
print(re.sub(r':[a-z]+', r'<span>g<0></span>', method))
# => images/<span>:id</span>/huge
If you need to perform a case insensitive search, add flag=re.I
:
re.sub(r':[a-z]+', r'<span>g<0></span>', method, flags=re.I)
Trying to get to grips with regular expressions in Python, I’m trying to output some HTML highlighted in part of a URL. My input is
images/:id/size
my output should be
images/<span>:id</span>/size
If I do this in Javascript
method = 'images/:id/size';
method = method.replace(/:([a-z]+)/, '<span>$1</span>')
alert(method)
I get the desired result, but if I do this in Python
>>> method = 'images/:id/huge'
>>> re.sub(':([a-z]+)', '<span>$1</span>', method)
'images/<span>$1</span>/huge'
I don’t, how do I get Python to return the correct result rather than $1
? Is re.sub
even the right function to do this?
Use 1
instead of $1
.
number Matches the contents of the group of the same number.
http://docs.python.org/library/re.html#regular-expression-syntax
Simply use 1
instead of $1
:
In [1]: import re
In [2]: method = 'images/:id/huge'
In [3]: re.sub(r'(:[a-z]+)', r'<span>1</span>', method)
Out[3]: 'images/<span>:id</span>/huge'
Also note the use of raw strings (r'...'
) for regular expressions. It is not mandatory but removes the need to escape backslashes, arguably making the code slightly more readable.
For the replacement portion, Python uses 1
the way sed and vi do, not $1
the way Perl, Java, and Javascript (amongst others) do. Furthermore, because 1
interpolates in regular strings as the character U+0001, you need to use a raw string or escape it.
Python 3.2 (r32:88445, Jul 27 2011, 13:41:33)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> method = 'images/:id/huge'
>>> import re
>>> re.sub(':([a-z]+)', r'<span>1</span>', method)
'images/<span>id</span>/huge'
>>>
A backreference to the whole match value is g<0>
, see re.sub
documentation:
The backreference
g<0>
substitutes in the entire substring matched by the RE.
See the Python demo:
import re
method = 'images/:id/huge'
print(re.sub(r':[a-z]+', r'<span>g<0></span>', method))
# => images/<span>:id</span>/huge
If you need to perform a case insensitive search, add flag=re.I
:
re.sub(r':[a-z]+', r'<span>g<0></span>', method, flags=re.I)