Runtime of merging two lists in Python

Question:

Suppose we have two lists A = [a1, a2, ..., an](n elements), and B = [b1, b2, ..., bm](m elements), and we use “+” in Python to merge two lists into one, so

C = A + B;

My question is what the runtime of this operation is? My first guess is O(n+m), not sure if Python is smarter than that.

Asked By: Toby

||

Answers:

When you concatenate the two lists with A + B, you create a completely new list in memory. This means your guess is correct: the complexity is O(n + m) (where n and m are the lengths of the lists) since Python has to walk both lists in turn to build the new list.

You can see this happening in the list_concat function in the source code for Python lists:

static PyObject *
list_concat(PyListObject *a, PyObject *bb)
{
/* ...code snipped... */
    src = a->ob_item;
    dest = np->ob_item;
    for (i = 0; i < Py_SIZE(a); i++) {     /* walking list a */
        PyObject *v = src[i];
        Py_INCREF(v);
        dest[i] = v;
    }
    src = b->ob_item;
    dest = np->ob_item + Py_SIZE(a);
    for (i = 0; i < Py_SIZE(b); i++) {     /* walking list b */
        PyObject *v = src[i];
        Py_INCREF(v);
        dest[i] = v;
    }
/* ...code snipped... */

If you don’t need a new list in memory, it’s often a good idea to take advantage of the fact that lists are mutable (and this is where Python is smart). Using A.extend(B) is O(m) in complexity meaning that you avoid the overhead of copying list a.

The complexity of various list operations are listed here on the Python wiki.

Answered By: Alex Riley

Copying a list is O(n) (with n being the number of elements) and extending is O(k) (with k being the number of elements in the second list). Based on these two facts, I would think it couldn’t be any less than O(n+k), since this is a copy and extend operation, and the very least you would need to copy all the elements of both lists.

Source: Python TimeComplexity

Answered By: TheBlackCat

My first guess is O(n+m), not sure if Python is smarter than that.

Nothing can be smarter than that while returning a copy. Though even if A, B were immutable sequences such as strings; CPython still makes a full copy instead of aliasing the same memory (it simplifies implementation of the garbage collection for such strings).

In some specific cases, the operation could be O(1) depending on what you want to do with the result e.g., itertools.chain(A, B) allows to iterate over all items (it does not make a copy, the change in A, B affects yielded items). Or if you need a random access; you could emulate it using a Sequence subclass e.g., WeightedPopulation but in the general case the copy and therefore O(n+m) runtime is unavoidable.

Answered By: jfs
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.