strange python regex: not able to find match

Question:

I am facing some strange python regex issue. The following two strings are supposedly to be exactly the same. But somehow they are not matching.

import re
print(" \"")
print(" "+chr(92)+chr(34)+"")
print(re.search(" \"", " "+chr(92)+chr(34)+""))

However, the following does match

import re
print("\"")
print(""+chr(92)+chr(34)+"")
print(re.search("\"", ""+chr(92)+chr(34)+""))

Any thought on what is going on here?

Asked By: Qiang Li

||

Answers:

Issue is the backslash character has special meaning to a string in python. You can use a Python raw string created by prefixing a string literal with ‘r’ or ‘R’ where python raw string treats backslash () as a literal character.

import re
print(" \"")
print(" "+chr(92)+chr(34)+"")
print(re.search(r" \"", " "+chr(92)+chr(34)+""))

Output:

 "
 "
<re.Match object; span=(0, 3), match=' \"'>

In second example print(re.search("\"", ""+chr(92)+chr(34)+"")) outputs:
<re.Match object; span=(1, 2), match='"'> where only the double quote is matched.

Need to escape the backslash or use a raw string. If use single-quotes around the regexp then the double-quote does not need to be escaped.

s = "" + chr(92) + chr(34) + ""
print(re.search("\\"", s))
print(re.search(r"\"", s))
print(re.search(r'\"', s))

Output:

<re.Match object; span=(0, 2), match='\"'>
<re.Match object; span=(0, 2), match='\"'>
<re.Match object; span=(0, 2), match='\"'>

For further details on raw string and backslash in Python, see answers for this question.

Answered By: CodeMonkey
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.