Should I avoid converting to a string if a value is already a string?

Question:

Sometimes you have to use list comprehension to convert everything to string including strings themselves.

b = [str(a) for a in l]

But do I have to do:

b = [a if type(a)==str else str(a) for a in l]

I was wondering if str on a string is optimized enough to not create another copy of the string.

I have tried:

>>> x="aaaaaa"
>>> str(x) is x
True

but that may be because Python can cache strings, and reuses them. But is that behaviour guaranteed for any value of a string?

Answers:

Testing if an object is already a string is slower than just always converting to a string.

That’s because the str() method also makes the exact same test (is the object already a string). You are a) doing double the work, and b) your test is slower to boot.

Note: for Python 2, using str() on unicode objects includes an implicit encode to ASCII, and this can fail. You may still have to special case handling of such objects. In Python 3, there is no need to worry about that edge-case.

As there is some discussion around this:

  • isinstance(s, str) has a different meaning when s can be a subclass of str. As subclasses are treated exactly like any other type of object by str() (either __str__ or __repr__ is called on the object), this difference matters here.
  • You should use type(s) is str for exact type checks. Types are singletons, take advantage of this, is is faster:

    >>> import timeit
    >>> timeit.timeit("type(s) is str", "s = ''")
    0.10074466899823165
    >>> timeit.timeit("type(s) == str", "s = ''")
    0.1110201120027341
    
  • Using s if type(s) is str else str(s) is significantly slower for the non-string case:

    >>> import timeit
    >>> timeit.timeit("str(s)", "s = None")
    0.1823573520014179
    >>> timeit.timeit("s if type(s) is str else str(s)", "s = None")
    0.29589492800005246
    >>> timeit.timeit("str(s)", "s = ''")
    0.11716728399915155
    >>> timeit.timeit("s if type(s) is str else str(s)", "s = ''")
    0.12032335300318664
    

    (The timings for the s = '' cases are very close and keep swapping places).

All timings in this post were conducted on Python 3.6.0 on a Macbook Pro 15″ (Mid 2015), OS X 10.12.3.

Answered By: Martijn Pieters
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.