Create and parse a Python Raw string literal R""

Question:

Edit
I’m not sure if this question is being read correctly.
I already know what string formats are in Python.
Every single little detail, I already know.
Please stop directing me to questions about string types in Python.

This is a specific question that has to do with the problem string delimiter
in the body of a raw syntax construction.

I want to know why I can’t use the raw syntax r”” or r” form on this
raw string "word's" and have it exist in a variable just like this.

It doesn’t matter why I want to do this, but I’ve explained below.

Thanks.


I’m just going over a some syntax rules to parse and create
strings using the Raw String Syntax rules for r' ' and r" ".

For the record, I have read the docs and rules on raw strings.
The question is specific to escaping the delimiter within the raw string.

I have a utility that parses/makes other string types and is used
in production code.

I’m perplexed that Python does not remove the escape of the escaped delimiter when the string is in a variable.

Is this by design, ie. NOT removing the escape on the delimiter or what I am
hoping, just a missed part of the parse process.
Basically, a bug ?

The string is not really a raw image of the original if after parsing, it does
not look like the original.
After parsing, in a variable, it now becomes useless.

Is this an oversight and possibly something that will be corrected in the future?

As it is now, in my utility, I can only create a raw syntax form, but due to
this bug, I cannot parse it unless I take off the escape from the delimiter.

I mean, I guess I could do this as it is a direct inverse of making the string,
but it’s disturbing that the lexical parser leaves this artificial escape in the variable after
the parsing process.

Here is some code I used to verify the problem:

Code

#python 2.7.12

print "Raw targt string test = "word's""

v1 = r' "word's" '     # => "word's" 
v2 = r" "word's" "    # => "word's"

print "using r' ' syntax, variable contains  " + v1
print "using r" " syntax, variable contains  " + v2

if len(v1) == len(v2) :
   print "length's are equal" 
else :
   print "length's are NOT equal" 

Output

Raw targt string test = "word's"
using r' ' syntax, variable contains   "word's" 
using r" " syntax, variable contains   "word's" 
length's are NOT equal

Either

Asked By: user557597

||

Answers:

To quote the Python FAQ, raw string literals in Python were “designed to ease creating input for processors (chiefly regular expression engines) that want to do their own backslash escape processing”. Since the regex engine will strip the backslash in front of the quote character, Python doesn’t need to strip it. This behavior will most likely never be changed since it would severely break backwards compatibility.

So yes, it is by design — although it is quite confusing.

I want to know why I can’t use the raw syntax r”” or r” form on this
raw string “word’s” and have it exist in a variable just like this.

Python’s raw string literals were not designed to be able to represent every possible string. In particular, the string "' cannot be represented within r"" or r''. When you use raw string literals for regex patterns, this is not a problem, since the patterns "', "', "', and "', are equivalent (that is, they all match the single string "').

However, note that you can write the string "word's" using the triple-quoted raw string literal r'''"word's"'''.

Answered By: Mathias Rav

That’s not a bug, that’s intended behavior. When using r you’re telling the interpreter to interpret your string, well, raw – that means turn off all escape sequences and treat the backslash as an ordinary char:

Both string and bytes literals may optionally be prefixed with a letter ‘r’ or ‘R’; such strings are called raw strings and treat backslashes as literal characters. As a result, in string literals, ‘U’ and ‘u’ escapes in raw strings are not treated specially.

Since the backslash is treated as a literal character, when you do r' "word's" ' it’s equivalent to writing ' "word\'s" ', and since your double quoted string has different escape sequence: r" "word's" " it’s equivalent to: ' \"word's\" ' – hence, they don’t match (one more backslash, plus on different locations).

Unfortunately, since strings must be single or double quoted you must escape single quotes in a single-quoted string and double quotes in a double quoted string to avoid syntax error, but the r instruction tells the interpreter to treat all escapes literally. Besides, r was never intended for string operation anyway.

Answered By: zwer
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.