Compare string to bytes that works in both Python 2 and 3

Question:

What is the best way to compare a string object to a bytes object that works in both Python 2 and Python 3? More generally, how does one write a Python 2 and Python 3 compatible comparison of two objects that may each be a string, bytes, or Unicode object? Assume that the data is encoded (in the case of bytes) or encodable (in the case of strings) with UTF-8.

The problem is that "asdf" == b"asdf" is True in Python 2 and False in Python 3.

Meanwhile, one cannot blindly encode or decode objects, since strings in Python 2 have both encode and decode methods, but strings in Python 3 just have encode methods.

Finally, isinstance(obj, bytes) returns True for any non-unicode string in Python 2 and returns True for only bytes objects in Python 3.


Note to moderators: There has been some confusion as to why this question is needed (i.e. what practical problem this is solving). The particular problem that that motivated this question was how to interface with a library that changed the type of its return (from string to bytes) between Python 2 and Python 3. I needed a solution that was compatible with both to facilitate upgrading the codebase in question from one to the other, though this could also be relevant to downstream libraries that still want to have Python 2 compatibility. I didn’t need to do anything with the return other than test for equality against known payloads, hence the question just being about equality testing.

Asked By: Zags

||

Answers:

In both Python 2 and Python 3, anything that is an instance of bytes has a decode method. Thus, you can do the following:

def compare(a, b, encoding="utf8"):
    if isinstance(a, bytes):
        a = a.decode(encoding)
    if isinstance(b, bytes):
        b = b.decode(encoding)
    return a == b
Answered By: Zags

You can check whether you’re using Python 2 or 3 and act accordingly:

import sys

if sys.version_info[0] < 3:
    text_type = unicode
else:
    text_type = str

if isinstance(obj, text_type):
    result = obj.encode('utf-8')
else:
    result = obj
Answered By: Simeon Visser
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.