What's the difference between r'string' and normal 'string' in python?
Question:
What’s the difference between r string (r'foobar'
) and normal string ('foobar'
) in python? Is r’string’ a regex string?
I’ve tried the following and there isn’t any effects on my regex matches:
>>> import re
>>> n = 3
>>> rgx = '(?=('+'S'*n+'))'
>>> x = 'foobar'
>>> re.findall(rgx,x)
['foo', 'oob', 'oba', 'bar']
>>>
>>> rgx2 = r'(?=('+'S'*n+'))'
>>> re.findall(rgx2,x)
['foo', 'oob', 'oba', 'bar']
>>>
>>> rgx3 = r'(?=(SSS))'
>>> re.findall(rgx3,x)
['foo', 'oob', 'oba', 'bar']
Answers:
r
doesn’t signify a "regex string"; it means "raw string". As per the docs:
String literals may optionally be prefixed with a letter 'r'
or 'R'
; such strings are called raw strings and use different rules for interpreting backslash escape sequences.
They are commonly used (and recommended) for regular expressions because regex and non-raw strings both use backslashes as an escape character. For example, to match a literal backslash with a regex in a normal string would be '\\'
; using a raw string, it’s just '\'
.
The difference would become apparent in cases when you have backslash escapes:
>>> s="foobar"
>>> import re
>>> re.sub('(o)1', '', s) # Using the backreference has no effect here as it's interpreted as a literal escaped 1
'foobar'
>>> re.sub(r'(o)1', '', s) # Using the backreference works!
'fbar'
>>> re.sub('(o)\1', '', s) # You need to escape the backslash here
'fbar'
Quoting from String literal:
A few languages provide a method of specifying that a literal is to be
processed without any language-specific interpretation. This avoids
the need for escaping, and yields more legible strings.
You might also want to refer to Lexical Analysis.
What’s the difference between r string (r'foobar'
) and normal string ('foobar'
) in python? Is r’string’ a regex string?
I’ve tried the following and there isn’t any effects on my regex matches:
>>> import re
>>> n = 3
>>> rgx = '(?=('+'S'*n+'))'
>>> x = 'foobar'
>>> re.findall(rgx,x)
['foo', 'oob', 'oba', 'bar']
>>>
>>> rgx2 = r'(?=('+'S'*n+'))'
>>> re.findall(rgx2,x)
['foo', 'oob', 'oba', 'bar']
>>>
>>> rgx3 = r'(?=(SSS))'
>>> re.findall(rgx3,x)
['foo', 'oob', 'oba', 'bar']
r
doesn’t signify a "regex string"; it means "raw string". As per the docs:
String literals may optionally be prefixed with a letter
'r'
or'R'
; such strings are called raw strings and use different rules for interpreting backslash escape sequences.
They are commonly used (and recommended) for regular expressions because regex and non-raw strings both use backslashes as an escape character. For example, to match a literal backslash with a regex in a normal string would be '\\'
; using a raw string, it’s just '\'
.
The difference would become apparent in cases when you have backslash escapes:
>>> s="foobar"
>>> import re
>>> re.sub('(o)1', '', s) # Using the backreference has no effect here as it's interpreted as a literal escaped 1
'foobar'
>>> re.sub(r'(o)1', '', s) # Using the backreference works!
'fbar'
>>> re.sub('(o)\1', '', s) # You need to escape the backslash here
'fbar'
Quoting from String literal:
A few languages provide a method of specifying that a literal is to be
processed without any language-specific interpretation. This avoids
the need for escaping, and yields more legible strings.
You might also want to refer to Lexical Analysis.