strange python regex: not able to find match
Question:
I am facing some strange python regex issue. The following two strings are supposedly to be exactly the same. But somehow they are not matching.
import re
print(" \"")
print(" "+chr(92)+chr(34)+"")
print(re.search(" \"", " "+chr(92)+chr(34)+""))
However, the following does match
import re
print("\"")
print(""+chr(92)+chr(34)+"")
print(re.search("\"", ""+chr(92)+chr(34)+""))
Any thought on what is going on here?
Answers:
Issue is the backslash character has special meaning to a string in python. You can use a Python raw string created by prefixing a string literal with ‘r’ or ‘R’ where python raw string treats backslash ()
as a literal character.
import re
print(" \"")
print(" "+chr(92)+chr(34)+"")
print(re.search(r" \"", " "+chr(92)+chr(34)+""))
Output:
"
"
<re.Match object; span=(0, 3), match=' \"'>
In second example print(re.search("\"", ""+chr(92)+chr(34)+""))
outputs:
<re.Match object; span=(1, 2), match='"'>
where only the double quote is matched.
Need to escape the backslash or use a raw string. If use single-quotes around the regexp then the double-quote does not need to be escaped.
s = "" + chr(92) + chr(34) + ""
print(re.search("\\"", s))
print(re.search(r"\"", s))
print(re.search(r'\"', s))
Output:
<re.Match object; span=(0, 2), match='\"'>
<re.Match object; span=(0, 2), match='\"'>
<re.Match object; span=(0, 2), match='\"'>
For further details on raw string and backslash in Python, see answers for this question.
I am facing some strange python regex issue. The following two strings are supposedly to be exactly the same. But somehow they are not matching.
import re
print(" \"")
print(" "+chr(92)+chr(34)+"")
print(re.search(" \"", " "+chr(92)+chr(34)+""))
However, the following does match
import re
print("\"")
print(""+chr(92)+chr(34)+"")
print(re.search("\"", ""+chr(92)+chr(34)+""))
Any thought on what is going on here?
Issue is the backslash character has special meaning to a string in python. You can use a Python raw string created by prefixing a string literal with ‘r’ or ‘R’ where python raw string treats backslash ()
as a literal character.
import re
print(" \"")
print(" "+chr(92)+chr(34)+"")
print(re.search(r" \"", " "+chr(92)+chr(34)+""))
Output:
"
"
<re.Match object; span=(0, 3), match=' \"'>
In second example print(re.search("\"", ""+chr(92)+chr(34)+""))
outputs:
<re.Match object; span=(1, 2), match='"'>
where only the double quote is matched.
Need to escape the backslash or use a raw string. If use single-quotes around the regexp then the double-quote does not need to be escaped.
s = "" + chr(92) + chr(34) + ""
print(re.search("\\"", s))
print(re.search(r"\"", s))
print(re.search(r'\"', s))
Output:
<re.Match object; span=(0, 2), match='\"'>
<re.Match object; span=(0, 2), match='\"'>
<re.Match object; span=(0, 2), match='\"'>
For further details on raw string and backslash in Python, see answers for this question.