Python string comparisons time complexity

Question:

I’m curious how Python performs string comparisons under the hood.
For example if

if s1 == s2:
   print(True)
else:
   print(False)

is the same as

condition= True
for x,y in zip(s1, s2):
    if x != y:
        condition = False 
print(condition)

Perhaps under the hood python is able to use ord values more efficiently than O(n) traversals?

Asked By: jbuddy_13

||

Answers:

A simple test:

s1 = "a"
s2 = "aa"
condition= True
for x,y in zip(s1, s2):
    if x != y:
        condition = False 
print(condition) # True

show that your assumption is incorrect.
Otherwise, python == is very efficient, so you can assume it’s at worse O(n).

Answered By: AdvMaple

Regardless of how it’s implemented, the comparison of two strings is going to take O(n) time. (There might exist pre-built side data structures that could help speed it up, but I’m assuming your input is just two strings and nothing else.)

Yes, the C implementation that == ends up calling is much faster, because it’s in C rather than as a Python loop, but its worse-case big-Oh complexity is still going to be O(n).

PS: as @AdvMaple pointed out, your alternative implementation is wrong, because zip stops as soon as one of its input runs out of elements, but that does not change the time-complexity question.

Answered By: joanis

Python’s string compare is implemented in unicodeobject.c. After a few checks such as string length and "kind" (python may use 1, 2 or 4 bytes per character depending on unicode USC character size), its just a call to the C lib memcmp.

With a quick change to your python code

condition = True
if len(s1) != len(s2):
    for x,y in zip(s1, s2):
        if x != y:
            condition = False
            break 

the python code has the same O(n) time complexity as memcmp, its just that python has a much bigger O. Time complexity doesn’t say anything about how long an operation takes, just how an operation scales with a larger input set n.

memcmp is much faster than the python version because of inherent language overhead. But it scales the same. And when you think about it, each of the if x != y: compares in the second example runs the exact same code as the single s1 == s2 compare in the first.

Answered By: tdelaney
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.