Understanding list comprehension for flattening list of lists in Python

Question:

I found this comprehension that works perfectly for flattening a list of lists:

>>> list_of_lists = [(1,2,3),(2,3,4),(3,4,5)]
>>> [item for sublist in list_of_lists for item in sublist]
[1, 2, 3, 2, 3, 4, 3, 4, 5]

I like this better than using itertools.chain(), but I just can’t understand it. I’ve tried surrounding parts with parentheses, to see if I could reduce the complexity, but now I’m just more confused:

>>> [(item for sublist in list_of_lists) for item in sublist]
[<generator object <genexpr> at 0x7ff919fdfd20>, <generator object <genexpr> at 0x7ff919fdfd70>, <generator object <genexpr> at 0x7ff919fdfdc0>]

>>> [item for sublist in (list_of_lists for item in sublist)]
[5, 5, 5]

I get this feeling that I’m having a hard time understanding, because I don’t quite understand how generators work… I mean, I thought I did, but now I’m seriously in doubt. Like I said, I love how compact this idiom is, and it’s exactly what I need, but I’m loathe to use code that I don’t understand.

What exactly is happening here?

Asked By: gbromios

||

Answers:

Read the for loops as if they were nested, from left to right. The expression on the left is the one that produces each value in the final list:

for sublist in list_of_lists:
    for item in sublist:
        item  # added to the list

List comprehensions also support if tests to filter what elements are used; these can also be seen as nested statements, in the same way as the for loops.

By adding parentheses, you changed the expression; everything in parentheses is now the left-hand expression to add:

for item in sublist:
    (item for sublist in list_of_lists)  # added to the list

A for loop like that is a generator expression. It works exactly like a list comprehension except that it doesn’t build a list. The elements are instead produced on demand. You can ask a generator expression for the next value, then the next value, etc.

In this case, there must be a preexisting sublist object for this to work at all; the outer loop is not over list_of_lists anymore, after all.

Your last attempt translates to:

for sublist in (list_of_lists for item in sublist):
    item  # added to the list

Here list_of_lists is a loop element in a generator expression looping over for item in sublist. Again, sublist must exist already for this to work. The loop then adds a preexisting item to the final list output.

In your case, apparently sublist is a list with 3 items in it; your final list produced 3 elements. item was bound to 5, so you got 3 times 5 in your output.

Answered By: Martijn Pieters

The list comprehension works like this:

[<what i want> <for loops in the order you'd write them naturally>]

In this case, <what I want> is every item in every sublist. To get those items, you just loop over the sublists in the original list, and save/yield each item in the sublist. Thus, the order of the for loops in the list comprehension is the same order you would have used if you did not use a list comprehension. The only confusing part is that the <what I want> comes first, and not inside the body of the last loop.

Answered By: timgeb

List Comprehension

When I first started with list comprehension, I read that like English sentences and I was able to easily understand them. For example,

[item for sublist in list_of_lists for item in sublist]

can be read like

for each sublist in list_of_lists and for each item in sublist add item

Also, the filtering part can be read as

for each sublist in list_of_lists and for each item in sublist add item only if it is valid

And the corresponding comprehension would be

[item for sublist in list_of_lists for item in sublist if valid(item)]

Generators

They are like land mines, triggered only when invoked with the next protocol. They are similar to functions, but till an exception is raised or the end of function is reached, they are not exhausted and they can be invoked again and again. The important thing is, they retain the state between the previous invocation and the current.

The difference between a generator and a function is that, generators use yield keyword to give the value back to the invoker. In case of a generator expression, they are similar to the list comprehension, the fist expression is the actual value being “yielded”.

With this basic understanding, if we look at your expressions in the question,

[(item for sublist in list_of_lists) for item in sublist]

You are mixing list comprehension with the generator expressions. This will be read like this

for each item in sublist add a generator expression which is defined as, for every sublist in list_of_lists yield item

which is not what you had in your mind. And since the generator expression is not iterated, the generator expression object is added in the list as it is. Since they will not be evaluated without being invoked with the next protocol, they will not produce any error (if there are any, unless they have syntax error). In this case, it will produce runtime error as sublist is not defined yet.

Also, in the last case,

[item for sublist in (list_of_lists for item in sublist)]
for each sublist in the generator expression, add item and the generator expression is defined as for each item in sublist yield list_of_lists.

The for loop will iterate any iterable with the next protocol. So, the generator expression will be evaluated and the item will always be the last element in the iteration of the sublist and you are adding that in the list. This will also produce runtime error, since sublist is not defined yet.

Answered By: thefourtheye