How does the list comprehension to flatten a python list work?

Question:

I recently looked for a way to flatten a nested python list, like this: [[1,2,3],[4,5,6]], into this: [1,2,3,4,5,6].

Stackoverflow was helpful as ever and I found a post with this ingenious list comprehension:

l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]

I thought I understood how list comprehensions work, but apparently I haven’t got the faintest idea. What puzzles me most is that besides the comprehension above, this also runs (although it doesn’t give the same result):

exactly_the_same_as_l = [item for item in sublist for sublist in l]

Can someone explain how python interprets these things? Based on the second comprension, I would expect that python interprets it back to front, but apparently that is not always the case. If it were, the first comprehension should throw an error, because ‘sublist’ does not exist. My mind is completely warped, help!

Asked By: jkokorian

||

Answers:

The for loops are evaluated from left to right. Any list comprehension can be re-written as a for loop, as follows:

l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
    for item in sublist:
        flattened_l.append(item)

The above is the correct code for flattening a list, whether you choose to write it concisely as a list comprehension, or in this extended version.

The second list comprehension you wrote will raise a NameError, as ‘sublist’ has not yet been defined. You can see this by writing the list comprehension as a for loop:

l = [[1,2,3],[4,5,6]]
flattened_l = []
for item in sublist:
    for sublist in l:
        flattened_l.append(item)

The only reason you didn’t see the error when you ran your code was because you had previously defined sublist when implementing your first list comprehension.

For more information, you may want to check out Guido’s tutorial on list comprehensions.

Answered By: Eliezer

Let’s take a look at your list comprehension then, but first let’s start with list comprehension at it’s easiest.

l = [1,2,3,4,5]
print [x for x in l] # prints [1, 2, 3, 4, 5]

You can look at this the same as a for loop structured like so:

for x in l:
    print x

Now let’s look at another one:

l = [1,2,3,4,5]
a = [x for x in l if x % 2 == 0]
print a # prints [2,4]

That is the exact same as this:

a = []
l = [1,2,3,4,5]
for x in l:
    if x % 2 == 0:
        a.append(x)
print a # prints [2,4]

Now let’s take a look at the examples you provided.

l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
print flattened_l # prints [1,2,3,4,5,6]

For list comprehension start at the farthest to the left for loop and work your way in. The variable, item, in this case, is what will be added. It will produce this equivalent:

l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
    for item in sublist:
        flattened_l.append(item)

Now for the last one

exactly_the_same_as_l = [item for item in sublist for sublist in l]

Using the same knowledge we can create a for loop and see how it would behave:

for item in sublist:
    for sublist in l:
        exactly_the_same_as_l.append(item)

Now the only reason the above one works is because when flattened_l was created, it also created sublist. It is a scoping reason to why that did not throw an error. If you ran that without defining the flattened_l first, you would get a NameError

Answered By: Chrispresso

Note, of course, that the sort of comprehension will only “flatten” a list of lists (or list of other iterables). Also if you pass it a list of strings you’ll “flatten” it into a list of characters.

To generalize this in a meaningful way you first want to be able to cleanly distinguish between strings (or bytearrays) and other types of sequences (or other Iterables). So let’s start with a simple function:

import collections
def non_str_seq(p):
    '''p is putatively a sequence and not a string nor bytearray'''
    return isinstance(p, collections.Iterable) and not (isinstance(p, str) or isinstance(p, bytearray))

Using that we can then build a recursive function to flatten any

def flatten(s):
    '''Recursively flatten any sequence of objects
    '''
    results = list()
    if non_str_seq(s):
        for each in s:
            results.extend(flatten(each))
    else:
        results.append(s)
    return results

There are probably more elegant ways to do this. But this works for all the Python built-in types that I know of. Simple objects (numbers, strings, instances of None, True, False are all returned wrapped in list. Dictionaries are returned as lists of keys (in hash order).

Answered By: Jim Dennis

For the lazy dev that wants a quick answer:

>>> a = [[1,2], [3,4]]
>>> [i for g in a for i in g]
[1, 2, 3, 4]
Answered By: Tjorriemorrie

While this approach definitely works for flattening lists, I wouldn’t recommend it unless your sublists are known to be very small (1 or 2 elements each).

I’ve done a bit of profiling with timeit and found that this takes roughly 2-3 times longer than using a single loop and calling extend…

def flatten(l):
    flattened = []
    for sublist in l:
        flattened.extend(sublist)
    return flattened

While it’s not as pretty, the speedup is significant. I suppose this works so well because extend can more efficiently copy the whole sublist at once instead of copying each element, one at a time. I would recommend using extend if you know your sublists are medium-to-large in size. The larger the sublist, the bigger the speedup.

One final caveat: obviously, this only holds true if you need to eagerly form this flattened list. Perhaps you’ll be sorting it later, for example. If you’re ultimately going to just loop through the list as-is, this will not be any better than using the nested loops approach outlined by others. But for that use case, you want to return a generator instead of a list for the added benefit of laziness…

def flatten(l):
    return (item for sublist in l for item in sublist) # note the parens
Answered By: mklbtz
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.