join list of lists in python

Question:

Is the a short syntax for joining a list of lists into a single list( or iterator) in python?

For example I have a list as follows and I want to iterate over a,b and c.

``````x = [["a","b"], ["c"]]
``````

The best I can come up with is as follows.

``````result = []
[ result.extend(el) for el in x]

for el in result:
print el
``````

This is known as flattening, and there are a LOT of implementations out there.

``````>>> x = [["a","b"], ["c"]]
>>> for el in sum(x, []):
...     print el
...
a
b
c
``````

From those links, apparently the most complete-fast-elegant-etc implementation is the following:

``````def flatten(l, ltypes=(list, tuple)):
ltype = type(l)
l = list(l)
i = 0
while i < len(l):
while isinstance(l[i], ltypes):
if not l[i]:
l.pop(i)
i -= 1
break
else:
l[i:i + 1] = l[i]
i += 1
return ltype(l)
``````

What you’re describing is known as flattening a list, and with this new knowledge you’ll be able to find many solutions to this on Google (there is no built-in flatten method). Here is one of them, from http://www.daniel-lemire.com/blog/archives/2006/05/10/flattening-lists-in-python/:

``````def flatten(x):
flat = True
ans = []
for i in x:
if ( i.__class__ is list):
ans = flatten(i)
else:
ans.append(i)
return ans
``````
``````import itertools
a = [['a','b'], ['c']]
print(list(itertools.chain.from_iterable(a)))
``````

Sadly, Python doesn’t have a simple way to flatten lists. Try this:

``````def flatten(some_list):
for element in some_list:
if type(element) in (tuple, list):
for item in flatten(element):
yield item
else:
yield element
``````

Which will recursively flatten a list; you can then do

``````result = []
[ result.extend(el) for el in x]

for el in flatten(result):
print el
``````
``````x = [["a","b"], ["c"]]

result = sum(x, [])
``````

There’s always reduce (being deprecated to functools):

``````>>> x = [ [ 'a', 'b'], ['c'] ]
>>> for el in reduce(lambda a,b: a+b, x, []):
...  print el
...
__main__:1: DeprecationWarning: reduce() not supported in 3.x; use functools.reduce()
a
b
c
>>> import functools
>>> for el in functools.reduce(lambda a,b: a+b, x, []):
...   print el
...
a
b
c
>>>
``````

Unfortunately the plus operator for list concatenation can’t be used as a function — or fortunate, if you prefer lambdas to be ugly for improved visibility.

If you’re only going one level deep, a nested comprehension will also work:

``````>>> x = [["a","b"], ["c"]]
>>> [inner
...     for outer in x
...         for inner in outer]
['a', 'b', 'c']
``````

On one line, that becomes:

``````>>> [j for i in x for j in i]
['a', 'b', 'c']
``````

This works recursively for infinitely nested elements:

``````def iterFlatten(root):
if isinstance(root, (list, tuple)):
for element in root:
for e in iterFlatten(element):
yield e
else:
yield root
``````

Result:

```>>> b = [["a", ("b", "c")], "d"]
>>> list(iterFlatten(b))
['a', 'b', 'c', 'd']
```

Or a recursive operation:

``````def flatten(input):
ret = []
if not isinstance(input, (list, tuple)):
return [input]
for i in input:
if isinstance(i, (list, tuple)):
ret.extend(flatten(i))
else:
ret.append(i)
return ret
``````
``````flat_list = []
map(flat_list.extend, list_of_lists)
``````

shortest!

Late to the party but …

I’m new to python and come from a lisp background. This is what I came up with (check out the var names for lulz):

``````def flatten(lst):
if lst:
car,*cdr=lst
if isinstance(car,(list,tuple)):
if cdr: return flatten(car) + flatten(cdr)
return flatten(car)
if cdr: return [car] + flatten(cdr)
return [car]
``````

Seems to work. Test:

``````flatten((1,2,3,(4,5,6,(7,8,(((1,2)))))))
``````

returns:

``````[1, 2, 3, 4, 5, 6, 7, 8, 1, 2]
``````

For one-level flatten, if you care about speed, this is faster than any of the previous answers under all conditions I tried. (That is, if you need the result as a list. If you only need to iterate through it on the fly then the chain example is probably better.) It works by pre-allocating a list of the final size and copying the parts in by slice (which is a lower-level block copy than any of the iterator methods):

``````def join(a):
"""Joins a sequence of sequences into a single sequence.  (One-level flattening.)
E.g., join([(1,2,3), [4, 5], [6, (7, 8, 9), 10]]) = [1,2,3,4,5,6,(7,8,9),10]
This is very efficient, especially when the subsequences are long.
"""
n = sum([len(b) for b in a])
l = [None]*n
i = 0
for b in a:
j = i+len(b)
l[i:j] = b
i = j
return l
``````

``````[(0.5391559600830078, 'flatten4b'), # join() above.
(0.5400412082672119, 'flatten4c'), # Same, with sum(len(b) for b in a)
(0.5419249534606934, 'flatten4a'), # Similar, using zip()
(0.7351131439208984, 'flatten1b'), # list(itertools.chain.from_iterable(a))
(0.7472689151763916, 'flatten1'), # list(itertools.chain(*a))
(1.5468521118164062, 'flatten3'), # [i for j in a for i in j]
(26.696547985076904, 'flatten2')] # sum(a, [])
``````

If you need a list, not a generator, use `list():`

``````from itertools import chain
x = [["a","b"], ["c"]]
y = list(chain(*x))
``````

I had a similar problem when I had to create a dictionary that contained the elements of an array and their count. The answer is relevant because, I flatten a list of lists, get the elements I need and then do a group and count. I used Python’s map function to produce a tuple of element and it’s count and groupby over the array. Note that the groupby takes the array element itself as the keyfunc. As a relatively new Python coder, I find it to me more easier to comprehend, while being Pythonic as well.

Before I discuss the code, here is a sample of data I had to flatten first:

``````{ "_id" : ObjectId("4fe3a90783157d765d000011"), "status" : [ "opencalais" ],
"content_length" : 688, "open_calais_extract" : { "entities" : [
{"type" :"Person","name" : "Iman Samdura","rel_score" : 0.223 },
{"type" : "Company",  "name" : "Associated Press",    "rel_score" : 0.321 },
{"type" : "Country",  "name" : "Indonesia",   "rel_score" : 0.321 }, ... ]},
"title" : "Indonesia Police Arrest Bali Bomb Planner", "time" : "06:42  ET",
"filename" : "021121bn.01", "month" : "November", "utctime" : 1037836800,
"date" : "November 21, 2002", "news_type" : "bn", "day" : "21" }
``````

It is a query result from Mongo. The code below flattens a collection of such lists.

``````def flatten_list(items):
return sorted([entity['name'] for entity in [entities for sublist in
[item['open_calais_extract']['entities'] for item in items]
for entities in sublist])
``````

First, I would extract all the “entities” collection, and then for each entities collection, iterate over the dictionary and extract the name attribute.

A performance comparison:

``````import itertools
import timeit
big_list = [[0]*1000 for i in range(1000)]
timeit.repeat(lambda: list(itertools.chain.from_iterable(big_list)), number=100)
timeit.repeat(lambda: list(itertools.chain(*big_list)), number=100)
timeit.repeat(lambda: (lambda b: map(b.extend, big_list))([]), number=100)
timeit.repeat(lambda: [el for list_ in big_list for el in list_], number=100)
[100*x for x in timeit.repeat(lambda: sum(big_list, []), number=1)]
``````

Producing:

``````>>> import itertools
>>> import timeit
>>> big_list = [[0]*1000 for i in range(1000)]
>>> timeit.repeat(lambda: list(itertools.chain.from_iterable(big_list)), number=100)
[3.016212113769325, 3.0148865239060227, 3.0126415732791028]
>>> timeit.repeat(lambda: list(itertools.chain(*big_list)), number=100)
[3.019953987082083, 3.528754223385439, 3.02181439266457]
>>> timeit.repeat(lambda: (lambda b: map(b.extend, big_list))([]), number=100)
[1.812084445152557, 1.7702404451095965, 1.7722977998725362]
>>> timeit.repeat(lambda: [el for list_ in big_list for el in list_], number=100)
[5.409658160700605, 5.477502077679354, 5.444318360412744]
>>> [100*x for x in timeit.repeat(lambda: sum(big_list, []), number=1)]
[399.27587954973444, 400.9240571138051, 403.7521153804846]
``````

This is with Python 2.7.1 on Windows XP 32-bit, but @temoto in the comments above got `from_iterable` to be faster than `map+extend`, so it’s quite platform and input dependent.

Stay away from `sum(big_list, [])`

Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.