Comprehension for flattening a sequence of sequences?

Question:

If I have sequence of sequences (maybe a list of tuples) I can use itertools.chain() to flatten it. But sometimes I feel like I would rather write it as a comprehension. I just can’t figure out how to do it. Here’s a very construed case:

Let’s say I want to swap the elements of every pair in a sequence. I use a string as a sequence here:

>>> from itertools import chain
>>> seq = '012345'
>>> swapped_pairs = zip(seq[1::2], seq[::2])
>>> swapped_pairs
[('1', '0'), ('3', '2'), ('5', '4')]
>>> "".join(chain(*swapped_pairs))
'103254'

I use zip on the even and odd slices of the sequence to swap the pairs. But I end up with a list of tuples that now need to be flattened. So I use chain(). Is there a way I could express it with a comprehension instead?

If you want to post your own solution to the basic problem of swapping elements of the pairs, go ahead, I’ll up-vote anything that teaches me something new. But I will only mark as accepted an answer that is targeted on my question, even if the answer is “No, you can’t.”.

Asked By: PEZ

||

Answers:

You could use reduce to achive your goal:

In [6]: import operator
In [7]: a = [(1, 2), (2,3), (4,5)]
In [8]: reduce(operator.add, a, ())
Out[8]: (1, 2, 2, 3, 4, 5)

This return a tuple instead of a list because the elements in your original list are tuples that get concatenated. But you can easily build a list from that and the join method accepts tuples, too.

A list comprehension is, by the way, not the right tool for that. Basically a list comprehension builds a new list by describing how the elements of this list should look like. You want to reduce a list of elements to only one value.

Answered By: unbeknown

With a comprehension? Well…

>>> seq = '012345'
>>> swapped_pairs = zip(seq[1::2], seq[::2])
>>> ''.join(item for pair in swapped_pairs for item in pair)
'103254'
Answered By: nosklo
>>> a = [(1, 2), (3, 4), (5, 6)]
>>> reduce(tuple.__add__, a)
>>> (1, 2, 3, 4, 5, 6)

Or, to be agnostic about the type of inner sequences (as long as they are all the same):

>>> reduce(a[0].__class__.__add__, a)
Answered By: Arkady

Quickest I’ve found is to start with an empty array and extend it:

In [1]: a = [['abc', 'def'], ['ghi'],['xzy']]

In [2]: result = []

In [3]: extend = result.extend

In [4]: for l in a:
   ...:     extend(l)
   ...: 

In [5]: result
Out[5]: ['abc', 'def', 'ghi', 'xzy']

This is over twice as fast for the example in Alex Martelli’s attempt on: Making a flat list out of list of lists in Python

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 86.3 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99'  'b = []' 'extend = b.extend' 'for sub in l:' '    extend(sub)'
10000 loops, best of 3: 36.6 usec per loop

I came up with this because I had a hunch that behind the scenes, extend would allocate the right amount of memory for the list, and probably uses some low-level code to move items in. I have no idea if this is true, but who cares, it is faster.

By the way, it’s only a linear speedup:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]'  'b = []' 'extend = b.extend' 'for sub in l:' '    extend(sub)'
1000000 loops, best of 3: 0.844 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]' '[item for sublist in l for item in sublist]'
1000000 loops, best of 3: 1.56 usec per loop

You can also use the map(results.extend, a), but this is slower as it is building its own list of Nones.

It also gives you some of the benefits of not using functional programming. i.e.

  • you can extend an existing list instead of creating an empty one,
  • you can still understand the code at a glance, minutes, days or even months later.

By the way, probably best to avoid list comprehensions. Small ones aren’t too bad, but in general list comprehensions don’t actually save you much typing, but are often harder to understand and very hard to change or refactor (ever seen a three level list comprehension?). Google coding guidelines advise against them except in simple cases. My opinion is that they are only useful in ‘throw-away’ code, i.e. code where the author doesn’t care about readability, or code that is known to never require future maintenance.

Compare these two ways of writing the same thing:

result = [item for sublist in l for item in sublist]

with this:

result = []
for sublist in l:
    for item in sublist:
        result.append(item)

YMMV, but the first one stopped me in my tracks and I had to think about it. In the second the nesting is made obvious from the indentation.

Answered By: Mike A