bytes.decode() in Python2 and Python3

Question:

In the source code of sqlalchemy I see following

    val = cursor.fetchone()[0]
    if util.py3k and isinstance(val, bytes):
        val = val.decode()

Why we do decode only for Python3 and doesn’t do it for Python2?

Asked By: Rudziankoŭ

||

Answers:

You can check out a detail documentation of string encoding frustration here.

In short, since SQLAlchemy contains legacy API that parses the data into bytes data, the said statement is a simple way to migrate the string bytes data to Unicode in python 3.

Answered By: mootmoot

In Python 3, "normal" strings are Unicode (as opposed to Python 2 where they are (Extended) ASCII (or ANSI)). According to [Python 3.Docs]: Unicode HOWTO – The String Type:

Since Python 3.0, the language’s str type contains Unicode characters, meaning any string created using "unicode rocks!", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode.

Example:

  • Python 3:

    >>> import sys
    >>> sys.version
    '3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)]'
    >>>
    >>> b = b"abcd"
    >>> s = "abcd"
    >>> u = u"abcd"
    >>>
    >>> type(b), type(s), type(u)
    (<class 'bytes'>, <class 'str'>, <class 'str'>)
    >>>
    >>> b.decode()
    'abcd'
    >>> s.decode()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'str' object has no attribute 'decode'
    >>> u.decode()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'str' object has no attribute 'decode'
    
  • Python 2:

    >>> import sys
    >>> sys.version
    '2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]'
    >>>
    >>> b = b"abcd"
    >>> s = "abcd"
    >>> u = u"abcd"
    >>>
    >>> type(b), type(s), type(u)
    (<type 'str'>, <type 'str'>, <type 'unicode'>)
    >>>
    >>> b.decode()
    u'abcd'
    >>> s.decode()
    u'abcd'
    >>> u.decode()
    u'abcd'
    

val will be further passed (to _parse_server_version) as a str. Since in Python 3, bytes and str differ, the conversion is performed.

You could also check [SO]: Passing utf-16 string to a Windows function (@CristiFati’s answer).

Answered By: CristiFati