What's the difference between r'string' and normal 'string' in python?

Question:

What’s the difference between r string (r'foobar') and normal string ('foobar') in python? Is r’string’ a regex string?

I’ve tried the following and there isn’t any effects on my regex matches:

>>> import re
>>> n = 3
>>> rgx = '(?=('+'S'*n+'))'
>>> x = 'foobar'
>>> re.findall(rgx,x)
['foo', 'oob', 'oba', 'bar']
>>>
>>> rgx2 = r'(?=('+'S'*n+'))'
>>> re.findall(rgx2,x)
['foo', 'oob', 'oba', 'bar']
>>>
>>> rgx3 = r'(?=(SSS))'
>>> re.findall(rgx3,x)
['foo', 'oob', 'oba', 'bar']
Asked By: alvas

||

Answers:

r doesn’t signify a "regex string"; it means "raw string". As per the docs:

String literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and use different rules for interpreting backslash escape sequences.

They are commonly used (and recommended) for regular expressions because regex and non-raw strings both use backslashes as an escape character. For example, to match a literal backslash with a regex in a normal string would be '\\'; using a raw string, it’s just '\'.

Answered By: jonrsharpe

The difference would become apparent in cases when you have backslash escapes:

>>> s="foobar"
>>> import re
>>> re.sub('(o)1', '', s)     # Using the backreference has no effect here as it's interpreted as a literal escaped 1
'foobar'
>>> re.sub(r'(o)1', '', s)    # Using the backreference works!
'fbar'
>>> re.sub('(o)\1', '', s)    # You need to escape the backslash here
'fbar'

Quoting from String literal:

A few languages provide a method of specifying that a literal is to be
processed without any language-specific interpretation. This avoids
the need for escaping, and yields more legible strings.

You might also want to refer to Lexical Analysis.

Answered By: devnull
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.