What does the "r" in pythons re.compile(r' pattern flags') mean?

Question:

I am reading through http://docs.python.org/2/library/re.html. According to this the “r” in pythons re.compile(r‘ pattern flags’) refers the raw string notation :

The solution is to use Python’s raw string notation for regular
expression patterns; backslashes are not handled in any special way in
a string literal prefixed with ‘r’. So r”n” is a two-character string
containing ” and ‘n’, while “n” is a one-character string
containing a newline. Usually patterns will be expressed in Python
code using this raw string notation.

Would it be fair to say then that:

re.compile(r pattern) means that “pattern” is a regex while, re.compile(pattern) means that “pattern” is an exact match?

Asked By: user1592380

||

Answers:

As @PauloBu stated, the r string prefix is not specifically related to regex’s, but to strings generally in Python.

Normal strings use the backslash character as an escape character for special characters (like newlines):

>>> print('this is n a test')
this is 
 a test

The r prefix tells the interpreter not to do this:

>>> print(r'this is n a test')
this is n a test
>>> 

This is important in regular expressions, as you need the backslash to make it to the re module intact – in particular, b matches empty string specifically at the start and end of a word. re expects the string b, however normal string interpretation 'b' is converted to the ASCII backspace character, so you need to either explicitly escape the backslash ('\b'), or tell python it is a raw string (r'b').

>>> import re
>>> re.findall('b', 'test') # the backslash gets consumed by the python string interpreter
[]
>>> re.findall('\b', 'test') # backslash is explicitly escaped and is passed through to re module
['', '']
>>> re.findall(r'b', 'test') # often this syntax is easier
['', '']
Answered By: Peter Gibson

No, as the documentation pasted in explains the r prefix to a string indicates that the string is a raw string.

Because of the collisions between Python escaping of characters and regex escaping, both of which use the back-slash character, raw strings provide a way to indicate to python that you want an unescaped string.

Examine the following:

>>> "n"
'n'
>>> r"n"
'\n'
>>> print "n"


>>> print r"n"
n

Prefixing with an r merely indicates to the string that backslashes should be treated literally and not as escape characters for python.

This is helpful, when for example you are searching on a word boundry. The regex for this is b, however to capture this in a Python string, I’d need to use "\b" as the pattern. Instead, I can use the raw string: r"b" to pattern match on.

This becomes especially handy when trying to find a literal backslash in regex. To match a backslash in regex I need to use the pattern \, to escape this in python means I need to escape each slash and the pattern becomes "\\", or the much simpler r"\".

As you can guess in longer and more complex regexes, the extra slashes can get confusing, so raw strings are generally considered the way to go.

Answered By: user764357

No. Not everything in regex syntax needs to be preceded by , so ., *, +, etc still have special meaning in a pattern

The r'' is often used as a convenience for regex that do need a lot of as it prevents the clutter of doubling up the

Answered By: John La Rooy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.