strange python regex: not able to find match


I am facing some strange python regex issue. The following two strings are supposedly to be exactly the same. But somehow they are not matching.

import re
print(" \"")
print(" "+chr(92)+chr(34)+"")
print(" \"", " "+chr(92)+chr(34)+""))

However, the following does match

import re
print("\"", ""+chr(92)+chr(34)+""))

Any thought on what is going on here?

Asked By: Qiang Li



Issue is the backslash character has special meaning to a string in python. You can use a Python raw string created by prefixing a string literal with ‘r’ or ‘R’ where python raw string treats backslash () as a literal character.

import re
print(" \"")
print(" "+chr(92)+chr(34)+"")
print(" \"", " "+chr(92)+chr(34)+""))


<re.Match object; span=(0, 3), match=' \"'>

In second example print("\"", ""+chr(92)+chr(34)+"")) outputs:
<re.Match object; span=(1, 2), match='"'> where only the double quote is matched.

Need to escape the backslash or use a raw string. If use single-quotes around the regexp then the double-quote does not need to be escaped.

s = "" + chr(92) + chr(34) + ""
print("\\"", s))
print("\"", s))
print('\"', s))


<re.Match object; span=(0, 2), match='\"'>
<re.Match object; span=(0, 2), match='\"'>
<re.Match object; span=(0, 2), match='\"'>

For further details on raw string and backslash in Python, see answers for this question.

Answered By: CodeMonkey
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.