Why do I get an int when I index bytes?

Question:

I’m trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:

>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>

This seems unintuitive to me, as I was expecting to get b'j'.

I have discovered that I can get the value I expect, but it feels like a hack to me.

>>> my_bytes[0:1]
b'j'

Can someone please explain why this happens?

Asked By: meshy

||

Answers:

The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.

From the documentation:

Bytes objects are immutable sequences of single bytes.

[…]

While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]

[…]

Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).

Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.

This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.

Answered By: Martijn Pieters
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.