Python re.sub back reference not back referencing

Question:

I have the following:

<text top="52" left="20" width="383" height="15" font="0"><b>test</b></text>

and I have the following:

fileText = re.sub("<b>(.*?)</b>", "1", fileText, flags=re.DOTALL)

In which fileText is the string I posted above. When I print out fileText after I run the regex replacement I get back

<text top="52" left="20" width="383" height="15" font="0"></text>

instead of the expected

<text top="52" left="20" width="383" height="15" font="0">test</text>

Now I am fairly proficient at regex and I know that it should work, in fact I know that it matches properly because I can see it in the groups when I do a search and print out the groups but I am new to python and am confused as to why its not working with back references properly

Asked By: csteifel

||

Answers:

You need to use a raw-string here so that the backslash isn’t processed as an escape character:

>>> import re
>>> fileText = '<text top="52" left="20" width="383" height="15" font="0"><b>test</b></text>'
>>> fileText = re.sub("<b>(.*?)</b>", r"1", fileText, flags=re.DOTALL)
>>> fileText
'<text top="52" left="20" width="383" height="15" font="0">test</text>'
>>>

Notice how "1" was changed to r"1". Though it is a very small change (one character), it has a big effect. See below:

>>> "1"
'x01'
>>> r"1"
'\1'
>>>
Answered By: user2555451
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.