What does the 'b' character do in front of a string literal?

Question:

Apparently, the following is the valid syntax:

b'The string'

I would like to know:

  1. What does this b character in front of the string mean?
  2. What are the effects of using it?
  3. What are appropriate situations to use it?

I found a related question right here on SO, but that question is about PHP though, and it states the b is used to indicate the string is binary, as opposed to Unicode, which was needed for code to be compatible from version of PHP < 6, when migrating to PHP 6. I don’t think this applies to Python.

I did find this documentation on the Python site about using a u character in the same syntax to specify a string as Unicode. Unfortunately, it doesn’t mention the b character anywhere in that document.

Also, just out of curiosity, are there more symbols than the b and u that do other things?

Asked By: Jesse Webb

||

Answers:

To quote the Python 2.x documentation:

A prefix of ‘b’ or ‘B’ is ignored in
Python 2; it indicates that the
literal should become a bytes literal
in Python 3 (e.g. when code is
automatically converted with 2to3). A
‘u’ or ‘b’ prefix may be followed by
an ‘r’ prefix.

The Python 3 documentation states:

Bytes literals are always prefixed with ‘b’ or ‘B’; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.

Answered By: NPE

It turns it into a bytes literal (or str in 2.x), and is valid for 2.6+.

The r prefix causes backslashes to be “uninterpreted” (not ignored, and the difference does matter).

The b denotes a byte string.

Bytes are the actual data. Strings are an abstraction.

If you had multi-character string object and you took a single character, it would be a string, and it might be more than 1 byte in size depending on encoding.

If took 1 byte with a byte string, you’d get a single 8-bit value from 0-255 and it might not represent a complete character if those characters due to encoding were > 1 byte.

TBH I’d use strings unless I had some specific low level reason to use bytes.

Answered By: user774340

Python 3.x makes a clear distinction between the types:

If you’re familiar with:

  • Java or C#, think of str as String and bytes as byte[];
  • SQL, think of str as NVARCHAR and bytes as BINARY or BLOB;
  • Windows registry, think of str as REG_SZ and bytes as REG_BINARY.

If you’re familiar with C(++), then forget everything you’ve learned about char and strings, because a character is not a byte. That idea is long obsolete.

You use str when you want to represent text.

print('שלום עולם')

You use bytes when you want to represent low-level binary data like structs.

NaN = struct.unpack('>d', b'xffxf8x00x00x00x00x00x00')[0]

You can encode a str to a bytes object.

>>> 'uFEFF'.encode('UTF-8')
b'xefxbbxbf'

And you can decode a bytes into a str.

>>> b'xE2x82xAC'.decode('UTF-8')
'€'

But you can’t freely mix the two types.

>>> b'xEFxBBxBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The b'...' notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers.

>>> b'A' == b'x41'
True

But I must emphasize, a character is not a byte.

>>> 'A' == b'A'
False

In Python 2.x

Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was:

  • unicode = u'...' literals = sequence of Unicode characters = 3.x str
  • str = '...' literals = sequences of confounded bytes/characters
    • Usually text, encoded in some unspecified encoding.
    • But also used to represent binary data like struct.pack output.

In order to ease the 2.x-to-3.x transition, the b'...' literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes in 3.x) from text strings (which should be str in 3.x). The b prefix does nothing in 2.x, but tells the 2to3 script not to convert it to a Unicode string in 3.x.

So yes, b'...' literals in Python have the same purpose that they do in PHP.

Also, just out of curiosity, are there
more symbols than the b and u that do
other things?

The r prefix creates a raw string (e.g., r't' is a backslash + t instead of a tab), and triple quotes '''...''' or """...""" allow multi-line string literals.

Answered By: dan04

Here’s an example where the absence of b would throw a TypeError exception in Python 3.x

>>> f=open("new", "wb")
>>> f.write("Hello Python!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface

Adding a b prefix would fix the problem.

Answered By: user3053230

In addition to what others have said, note that a single character in unicode can consist of multiple bytes.

The way unicode works is that it took the old ASCII format (7-bit code that looks like 0xxx xxxx) and added multi-bytes sequences where all bytes start with 1 (1xxx xxxx) to represent characters beyond ASCII so that Unicode would be backwards-compatible with ASCII.

>>> len('Öl')  # German word for 'oil' with 2 characters
2
>>> 'Öl'.encode('UTF-8')  # convert str to bytes 
b'xc3x96l'
>>> len('Öl'.encode('UTF-8'))  # 3 bytes encode 2 characters !
3
Answered By: xjcl

From server side, if we send any response, it will be sent in the form of byte type, so it will appear in the client as b'Response from server'

In order get rid of b'....' simply use below code:

Server file:

stri="Response from server"    
c.send(stri.encode())

Client file:

print(s.recv(1024).decode())

then it will print Response from server

Answered By: Nani Chintha

You can use JSON to convert it to dictionary

import json
data = b'{"key":"value"}'
print(json.loads(data))

{“key”:”value”}


FLASK:

This is an example from flask. Run this on terminal line:

import requests
requests.post(url='http://localhost(example)/',json={'key':'value'})

In flask/routes.py

@app.route('/', methods=['POST'])
def api_script_add():
    print(request.data) # --> b'{"hi":"Hello"}'
    print(json.loads(request.data))
return json.loads(request.data)

{‘key’:’value’}

Answered By: Karam Qusai

The answer to the question is that, it does:

data.encode()

and in order to decode it(remove the b, because sometimes you don’t need it)

use:

data.decode()
Answered By: Billy

b"hello" is not a string (even though it looks like one), but a byte sequence. It is a sequence of 5 numbers, which, if you mapped them to a character table, would look like h e l l o. However the value itself is not a string, Python just has a convenient syntax for defining byte sequences using text characters rather than the numbers itself. This saves you some typing, and also often byte sequences are meant to be interpreted as characters. However, this is not always the case – for example, reading a JPG file will produce a sequence of nonsense letters inside b"..." because JPGs have a non-text structure.

.encode() and .decode() convert between strings and bytes.

Answered By: Haterind

bytes(somestring.encode()) is the solution that worked for me in python 3.

def compare_types():
    output = b'sometext'
    print(output)
    print(type(output))


    somestring = 'sometext'
    encoded_string = somestring.encode()
    output = bytes(encoded_string)
    print(output)
    print(type(output))


compare_types()
Answered By: Severin Spörri
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.