Runtime of merging two lists in Python
Question:
Suppose we have two lists A = [a1, a2, ..., an]
(n elements), and B = [b1, b2, ..., bm]
(m elements), and we use “+” in Python to merge two lists into one, so
C = A + B;
My question is what the runtime of this operation is? My first guess is O(n+m)
, not sure if Python is smarter than that.
Answers:
When you concatenate the two lists with A + B
, you create a completely new list in memory. This means your guess is correct: the complexity is O(n + m)
(where n
and m
are the lengths of the lists) since Python has to walk both lists in turn to build the new list.
You can see this happening in the list_concat
function in the source code for Python lists:
static PyObject *
list_concat(PyListObject *a, PyObject *bb)
{
/* ...code snipped... */
src = a->ob_item;
dest = np->ob_item;
for (i = 0; i < Py_SIZE(a); i++) { /* walking list a */
PyObject *v = src[i];
Py_INCREF(v);
dest[i] = v;
}
src = b->ob_item;
dest = np->ob_item + Py_SIZE(a);
for (i = 0; i < Py_SIZE(b); i++) { /* walking list b */
PyObject *v = src[i];
Py_INCREF(v);
dest[i] = v;
}
/* ...code snipped... */
If you don’t need a new list in memory, it’s often a good idea to take advantage of the fact that lists are mutable (and this is where Python is smart). Using A.extend(B)
is O(m)
in complexity meaning that you avoid the overhead of copying list a
.
The complexity of various list operations are listed here on the Python wiki.
Copying a list is O(n)
(with n
being the number of elements) and extending is O(k)
(with k
being the number of elements in the second list). Based on these two facts, I would think it couldn’t be any less than O(n+k)
, since this is a copy and extend operation, and the very least you would need to copy all the elements of both lists.
Source: Python TimeComplexity
My first guess is O(n+m)
, not sure if Python is smarter than that.
Nothing can be smarter than that while returning a copy. Though even if A
, B
were immutable sequences such as strings; CPython still makes a full copy instead of aliasing the same memory (it simplifies implementation of the garbage collection for such strings).
In some specific cases, the operation could be O(1)
depending on what you want to do with the result e.g., itertools.chain(A, B)
allows to iterate over all items (it does not make a copy, the change in A
, B
affects yielded items). Or if you need a random access; you could emulate it using a Sequence
subclass e.g., WeightedPopulation
but in the general case the copy and therefore O(n+m)
runtime is unavoidable.
Suppose we have two lists A = [a1, a2, ..., an]
(n elements), and B = [b1, b2, ..., bm]
(m elements), and we use “+” in Python to merge two lists into one, so
C = A + B;
My question is what the runtime of this operation is? My first guess is O(n+m)
, not sure if Python is smarter than that.
When you concatenate the two lists with A + B
, you create a completely new list in memory. This means your guess is correct: the complexity is O(n + m)
(where n
and m
are the lengths of the lists) since Python has to walk both lists in turn to build the new list.
You can see this happening in the list_concat
function in the source code for Python lists:
static PyObject *
list_concat(PyListObject *a, PyObject *bb)
{
/* ...code snipped... */
src = a->ob_item;
dest = np->ob_item;
for (i = 0; i < Py_SIZE(a); i++) { /* walking list a */
PyObject *v = src[i];
Py_INCREF(v);
dest[i] = v;
}
src = b->ob_item;
dest = np->ob_item + Py_SIZE(a);
for (i = 0; i < Py_SIZE(b); i++) { /* walking list b */
PyObject *v = src[i];
Py_INCREF(v);
dest[i] = v;
}
/* ...code snipped... */
If you don’t need a new list in memory, it’s often a good idea to take advantage of the fact that lists are mutable (and this is where Python is smart). Using A.extend(B)
is O(m)
in complexity meaning that you avoid the overhead of copying list a
.
The complexity of various list operations are listed here on the Python wiki.
Copying a list is O(n)
(with n
being the number of elements) and extending is O(k)
(with k
being the number of elements in the second list). Based on these two facts, I would think it couldn’t be any less than O(n+k)
, since this is a copy and extend operation, and the very least you would need to copy all the elements of both lists.
Source: Python TimeComplexity
My first guess is
O(n+m)
, not sure if Python is smarter than that.
Nothing can be smarter than that while returning a copy. Though even if A
, B
were immutable sequences such as strings; CPython still makes a full copy instead of aliasing the same memory (it simplifies implementation of the garbage collection for such strings).
In some specific cases, the operation could be O(1)
depending on what you want to do with the result e.g., itertools.chain(A, B)
allows to iterate over all items (it does not make a copy, the change in A
, B
affects yielded items). Or if you need a random access; you could emulate it using a Sequence
subclass e.g., WeightedPopulation
but in the general case the copy and therefore O(n+m)
runtime is unavoidable.