Why do 3 backslashes equal 4 in a Python string?

Question:

Could you tell me why '?\?'=='?\\?' gives True? That drives me crazy and I can’t find a reasonable answer…

>>> list('?\?')
['?', '\', '\', '?']
>>> list('?\\?')
['?', '\', '\', '?']
Asked By: kozooh

||

Answers:

Basically, because python is slightly lenient in backslash processing. Quoting from https://docs.python.org/2.0/ref/strings.html :

Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string.

(Emphasis in the original)

Therefore, in python, it isn’t that three backslashes are equal to four, it’s that when you follow backslash with a character like ?, the two together come through as two characters, because ? is not a recognized escape sequence.

Answered By: Daniel Martin

Because x in a character string, when x is not one of the special backslashable characters like n, r, t, 0, etc, evaluates to a string with a backslash and then an x.

>>> '?'
'\?'
Answered By: the paul

From the python lexical analysis page under string literals at:
https://docs.python.org/2/reference/lexical_analysis.html

There is a table that lists all the recognized escape sequences.

\ is an escape sequence that is ===

? is not an escape sequence and is === ?

so ‘\\’ is ‘\’ followed by ‘\’ which is ‘\’ (two escaped )

and ‘\’ is ‘\’ followed by ” which is also ‘\’ (one escaped and one raw )

also, it should be noted that python does not distinguish between single and double quotes surrounding a string literal, unlike some other languages.

So ‘String’ and “String” are the exact same thing in python, they do not affect the interpretation of escape sequences.

Answered By: rkh

This is because backslash acts as an escape character for the character(s) immediately following it, if the combination represents a valid escape sequence. The dozen or so escape sequences are listed here. They include the obvious ones such as newline n, horizontal tab t, carriage return r and more obscure ones such as named unicode characters using N{...}, e.g. N{WAVY DASH} which represents unicode character u3030. The key point though is that if the escape sequence is not known, the character sequence is left in the string as is.

Part of the problem might also be that the Python interpreter output is misleading you. This is because the backslashes are escaped when displayed. However, if you print those strings, you will see the extra backslashes disappear.

>>> '?\?'
'?\\?'
>>> print('?\?')
?\?
>>> '?\?' == '?\?'    # I don't know why you think this is True???
False
>>> '?\?' == r'?\?'   # but if you use a raw string for '?\?'
True
>>> '?\\?' == '?\?'  # this is the same string... see below
True

For your specific examples, in the first case '?\?', the first escapes the second backslash leaving a single backslash, but the third backslash remains as a backslash because ? is not a valid escape sequence. Hence the resulting string is ?\?.

For the second case '?\\?', the first backslash escapes the second, and the third backslash escapes the fourth which results in the string ?\?.

So that’s why three backslashes is the same as four:

>>> '?\?' == '?\\?'
True

If you want to create a string with 3 backslashes you can escape each backslash:

>>> '?\\\?'
'?\\\?'
>>> print('?\\\?')
?\?

or you might find “raw” strings more understandable:

>>> r'?\?'
'?\\\?'
>>> print(r'?\?')
?\?

This turns of escape sequence processing for the string literal. See String Literals for more details.

Answered By: mhawke

mhawke’s answer pretty much covers it, I just want to restate it in a more concise form and with minimal examples that illustrate this behaviour.

I guess one thing to add is that escape processing moves from left to right, so that n first finds the backslash and then looks for a character to escape, then finds n and escapes it; \n finds first backslash, finds second and escapes it, then finds n and sees it as a literal n; ? finds backslash and looks for a char to escape, finds ? which cannot be escaped, and so treats as a literal backslash.

As mhawke noted, the key here is that interactive interpreter escapes the backslash when displaying a string. I’m guessing the reason for that is to ensure that text strings copied from interpreter into code editor are valid python strings. However, in this case this allowance for convenience causes confusion.

>>> print('?') # ? is not a valid escape code so backslash is left as-is
?
>>> print('\?') # \ is a valid escape code, resulting in a single backslash
'?'

>>> '?' # same as first example except that interactive interpreter escapes the backslash
\?
>>> '\?' # same as second example, backslash is again escaped
\?
Answered By: Rainy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.