Why are simple comparisons slowing down my Python code?
Question:
I have made a Vector2i class with the __eq__
operator overloaded as shown below.
class Vector2i:
def __init__(self, x: int = 0, y: int = 0):
self.x: int = x
self.y: int = y
# (...)
def __eq__(self, other):
return self.x == other.x and self.y == other.y
# (...)
It appeared to be fine, when, suddenly, my code slowed down a lot. After running the Python profiler, I saw the overloaded operator was taking most of the time and seemed to be the cause of the slowdown.
I tired to replace the overloaded operator with a simple Python comparison in the parts of the code that were slow.
# Inside a for loop.
"""
if c.position != checkpos:
continue
"""
if c.position.x != checkpos.x or c.position.y != checkpos.y:
continue
Got the exact same result. The code slows down a lot in that comparison. What’s wrong?
Solution
The profiler was not very accurate and this was not the part slowing the code down. It was the loops running the comparison code over and over again. I fixed the problem by using numpy
for heavy array operations.
Answers:
When you overwrite __eq__
, the equality check changes from simply checking if the variables you are comparing point to the same object in memory, to checking the equality of the two object’s properties (x
and y
) which is your desired behaviour. That is a slightly more expensive operation which might be the cause of your slow down.
If you need to compare a large number of objects like this, it might be beneficial to use some other data structure like numpy arrays. Here is an example of the speed up:
import time
import numpy as np
x = np.random.randn(10000, 2)
y = np.random.randn(10000, 2)
t0 = time.time()
np.sum(x == y, axis=1)
print("Numpy time: ", time.time() - t0)
t0 = time.time()
for a, b in zip(x, y):
_ = a[0] == b[0]
_ = a[1] == b[1]
print("Loop time: ", time.time() - t0)
Output:
Numpy time: 0.0002689361572265625
Loop time: 0.004904747009277344
I have made a Vector2i class with the __eq__
operator overloaded as shown below.
class Vector2i:
def __init__(self, x: int = 0, y: int = 0):
self.x: int = x
self.y: int = y
# (...)
def __eq__(self, other):
return self.x == other.x and self.y == other.y
# (...)
It appeared to be fine, when, suddenly, my code slowed down a lot. After running the Python profiler, I saw the overloaded operator was taking most of the time and seemed to be the cause of the slowdown.
I tired to replace the overloaded operator with a simple Python comparison in the parts of the code that were slow.
# Inside a for loop.
"""
if c.position != checkpos:
continue
"""
if c.position.x != checkpos.x or c.position.y != checkpos.y:
continue
Got the exact same result. The code slows down a lot in that comparison. What’s wrong?
Solution
The profiler was not very accurate and this was not the part slowing the code down. It was the loops running the comparison code over and over again. I fixed the problem by using numpy
for heavy array operations.
When you overwrite __eq__
, the equality check changes from simply checking if the variables you are comparing point to the same object in memory, to checking the equality of the two object’s properties (x
and y
) which is your desired behaviour. That is a slightly more expensive operation which might be the cause of your slow down.
If you need to compare a large number of objects like this, it might be beneficial to use some other data structure like numpy arrays. Here is an example of the speed up:
import time
import numpy as np
x = np.random.randn(10000, 2)
y = np.random.randn(10000, 2)
t0 = time.time()
np.sum(x == y, axis=1)
print("Numpy time: ", time.time() - t0)
t0 = time.time()
for a, b in zip(x, y):
_ = a[0] == b[0]
_ = a[1] == b[1]
print("Loop time: ", time.time() - t0)
Output:
Numpy time: 0.0002689361572265625
Loop time: 0.004904747009277344