Why is a.insert(0,0) much slower than a[0:0]=[0]?
Question:
Using a list’s insert
function is much slower than achieving the same effect using slice assignment:
> python -m timeit -n 100000 -s "a=[]" "a.insert(0,0)"
100000 loops, best of 5: 19.2 usec per loop
> python -m timeit -n 100000 -s "a=[]" "a[0:0]=[0]"
100000 loops, best of 5: 6.78 usec per loop
(Note that a=[]
is only the setup, so a
starts empty but then grows to 100,000 elements.)
At first I thought maybe it’s the attribute lookup or function call overhead or so, but inserting near the end shows that that’s negligible:
> python -m timeit -n 100000 -s "a=[]" "a.insert(-1,0)"
100000 loops, best of 5: 79.1 nsec per loop
Why is the presumably simpler dedicated “insert single element” function so much slower?
I can also reproduce it at repl.it:
from timeit import repeat
for _ in range(3):
for stmt in 'a.insert(0,0)', 'a[0:0]=[0]', 'a.insert(-1,0)':
t = min(repeat(stmt, 'a=[]', number=10**5))
print('%.6f' % t, stmt)
print()
# Example output:
#
# 4.803514 a.insert(0,0)
# 1.807832 a[0:0]=[0]
# 0.012533 a.insert(-1,0)
#
# 4.967313 a.insert(0,0)
# 1.821665 a[0:0]=[0]
# 0.012738 a.insert(-1,0)
#
# 5.694100 a.insert(0,0)
# 1.899940 a[0:0]=[0]
# 0.012664 a.insert(-1,0)
I use Python 3.8.1 32-bit on Windows 10 64-bit.
repl.it uses Python 3.8.1 64-bit on Linux 64-bit.
Answers:
I think it’s probably just that they forgot to use memmove
in list.insert
. If you take a look at the code list.insert
uses to shift elements, you can see it’s just a manual loop:
for (i = n; --i >= where; )
items[i+1] = items[i];
while list.__setitem__
on the slice assignment path uses memmove
:
memmove(&item[ihigh+d], &item[ihigh],
(k - ihigh)*sizeof(PyObject *));
memmove
typically has a lot of optimization put into it, such as taking advantage of SSE/AVX instructions.
Using a list’s insert
function is much slower than achieving the same effect using slice assignment:
> python -m timeit -n 100000 -s "a=[]" "a.insert(0,0)"
100000 loops, best of 5: 19.2 usec per loop
> python -m timeit -n 100000 -s "a=[]" "a[0:0]=[0]"
100000 loops, best of 5: 6.78 usec per loop
(Note that a=[]
is only the setup, so a
starts empty but then grows to 100,000 elements.)
At first I thought maybe it’s the attribute lookup or function call overhead or so, but inserting near the end shows that that’s negligible:
> python -m timeit -n 100000 -s "a=[]" "a.insert(-1,0)"
100000 loops, best of 5: 79.1 nsec per loop
Why is the presumably simpler dedicated “insert single element” function so much slower?
I can also reproduce it at repl.it:
from timeit import repeat
for _ in range(3):
for stmt in 'a.insert(0,0)', 'a[0:0]=[0]', 'a.insert(-1,0)':
t = min(repeat(stmt, 'a=[]', number=10**5))
print('%.6f' % t, stmt)
print()
# Example output:
#
# 4.803514 a.insert(0,0)
# 1.807832 a[0:0]=[0]
# 0.012533 a.insert(-1,0)
#
# 4.967313 a.insert(0,0)
# 1.821665 a[0:0]=[0]
# 0.012738 a.insert(-1,0)
#
# 5.694100 a.insert(0,0)
# 1.899940 a[0:0]=[0]
# 0.012664 a.insert(-1,0)
I use Python 3.8.1 32-bit on Windows 10 64-bit.
repl.it uses Python 3.8.1 64-bit on Linux 64-bit.
I think it’s probably just that they forgot to use memmove
in list.insert
. If you take a look at the code list.insert
uses to shift elements, you can see it’s just a manual loop:
for (i = n; --i >= where; )
items[i+1] = items[i];
while list.__setitem__
on the slice assignment path uses memmove
:
memmove(&item[ihigh+d], &item[ihigh],
(k - ihigh)*sizeof(PyObject *));
memmove
typically has a lot of optimization put into it, such as taking advantage of SSE/AVX instructions.