Transpose/Unzip Function (inverse of zip)?

Question:

I have a list of 2-item tuples and I’d like to convert them to 2 lists where the first contains the first item in each tuple and the second list holds the second item.

For example:

original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# and I want to become...
result = (['a', 'b', 'c', 'd'], [1, 2, 3, 4])

Is there a builtin function that does that?

Asked By: Cristian

||

Answers:

In 2.x, zip is its own inverse! Provided you use the special * operator.

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]

This is equivalent to calling zip with each element of the list as a separate argument:

zip(('a', 1), ('b', 2), ('c', 3), ('d', 4))

except the arguments are passed to zip directly (after being converted to a tuple), so there’s no need to worry about the number of arguments getting too big.

In 3.x, zip returns a lazy iterator, but this is trivially converted:

>>> list(zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)]))
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
Answered By: Patrick

You could also do

result = ([ a for a,b in original ], [ b for a,b in original ])

It should scale better. Especially if Python makes good on not expanding the list comprehensions unless needed.

(Incidentally, it makes a 2-tuple (pair) of lists, rather than a list of tuples, like zip does.)

If generators instead of actual lists are ok, this would do that:

result = (( a for a,b in original ), ( b for a,b in original ))

The generators don’t munch through the list until you ask for each element, but on the other hand, they do keep references to the original list.

Answered By: Anders Eurenius

If you have lists that are not the same length, you may not want to use zip as per Patricks answer. This works:

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]

But with different length lists, zip truncates each item to the length of the shortest list:

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e')]

You can use map with no function to fill empty results with None:

>>> map(None, *[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e'), (1, 2, 3, 4, None)]

zip() is marginally faster though.

Answered By: Chris

I like to use zip(*iterable) (which is the piece of code you’re looking for) in my programs as so:

def unzip(iterable):
    return zip(*iterable)

I find unzip more readable.

Answered By: wassimans

It’s only another way to do it but it helped me a lot so I write it here:

Having this data structure:

X=[1,2,3,4]
Y=['a','b','c','d']
XY=zip(X,Y)

Resulting in:

In: XY
Out: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

The more pythonic way to unzip it and go back to the original is this one in my opinion:

x,y=zip(*XY)

But this return a tuple so if you need a list you can use:

x,y=(list(x),list(y))
Answered By: G M

To get a tuple of lists, as in the question:

>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple([list(tup) for tup in zip(*original)])
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])

To unpack the two lists into separate variables:

list1, list2 = [list(tup) for tup in zip(*original)]
Answered By: Noyer282

Since it returns tuples (and can use tons of memory), the zip(*zipped) trick seems more clever than useful, to me.

Here’s a function that will actually give you the inverse of zip.

def unzip(zipped):
    """Inverse of built-in zip function.
    Args:
        zipped: a list of tuples

    Returns:
        a tuple of lists

    Example:
        a = [1, 2, 3]
        b = [4, 5, 6]
        zipped = list(zip(a, b))

        assert zipped == [(1, 4), (2, 5), (3, 6)]

        unzipped = unzip(zipped)

        assert unzipped == ([1, 2, 3], [4, 5, 6])

    """

    unzipped = ()
    if len(zipped) == 0:
        return unzipped

    dim = len(zipped[0])

    for i in range(dim):
        unzipped = unzipped + ([tup[i] for tup in zipped], )

    return unzipped
Answered By: Waylon Flinn

None of the previous answers efficiently provide the required output, which is a tuple of lists, rather than a list of tuples. For the former, you can use tuple with map. Here’s the difference:

res1 = list(zip(*original))              # [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
res2 = tuple(map(list, zip(*original)))  # (['a', 'b', 'c', 'd'], [1, 2, 3, 4])

In addition, most of the previous solutions assume Python 2.7, where zip returns a list rather than an iterator.

For Python 3.x, you will need to pass the result to a function such as list or tuple to exhaust the iterator. For memory-efficient iterators, you can omit the outer list and tuple calls for the respective solutions.

Answered By: jpp

While zip(*seq) is very useful, it may be unsuitable for very long sequences as it will create a tuple of values to be passed in. For example, I’ve been working with a coordinate system with over a million entries and find it signifcantly faster to create the sequences directly.

A generic approach would be something like this:

from collections import deque
seq = ((a1, b1, …), (a2, b2, …), …)
width = len(seq[0])
output = [deque(len(seq))] * width # preallocate memory
for element in seq:
    for s, item in zip(output, element):
        s.append(item)

But, depending on what you want to do with the result, the choice of collection can make a big difference. In my actual use case, using sets and no internal loop, is noticeably faster than all other approaches.

And, as others have noted, if you are doing this with datasets, it might make sense to use Numpy or Pandas collections instead.

Answered By: Charlie Clark

Naive approach

def transpose_finite_iterable(iterable):
    return zip(*iterable)  # `itertools.izip` for Python 2 users

works fine for finite iterable (e.g. sequences like list/tuple/str) of (potentially infinite) iterables which can be illustrated like

| |a_00| |a_10| ... |a_n0| |
| |a_01| |a_11| ... |a_n1| |
| |... | |... | ... |... | |
| |a_0i| |a_1i| ... |a_ni| |
| |... | |... | ... |... | |

where

  • n in ℕ,
  • a_ij corresponds to j-th element of i-th iterable,

and after applying transpose_finite_iterable we get

| |a_00| |a_01| ... |a_0i| ... |
| |a_10| |a_11| ... |a_1i| ... |
| |... | |... | ... |... | ... |
| |a_n0| |a_n1| ... |a_ni| ... |

Python example of such case where a_ij == j, n == 2

>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterable(iterable)
>>> next(result)
(0, 0)
>>> next(result)
(1, 1)

But we can’t use transpose_finite_iterable again to return to structure of original iterable because result is an infinite iterable of finite iterables (tuples in our case):

>>> transpose_finite_iterable(result)
... hangs ...
Traceback (most recent call last):
  File "...", line 1, in ...
  File "...", line 2, in transpose_finite_iterable
MemoryError

So how can we deal with this case?

… and here comes the deque

After we take a look at docs of itertools.tee function, there is Python recipe that with some modification can help in our case

def transpose_finite_iterables(iterable):
    iterator = iter(iterable)
    try:
        first_elements = next(iterator)
    except StopIteration:
        return ()
    queues = [deque([element])
              for element in first_elements]

    def coordinate(queue):
        while True:
            if not queue:
                try:
                    elements = next(iterator)
                except StopIteration:
                    return
                for sub_queue, element in zip(queues, elements):
                    sub_queue.append(element)
            yield queue.popleft()

    return tuple(map(coordinate, queues))

let’s check

>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterables(transpose_finite_iterable(iterable))
>>> result
(<generator object transpose_finite_iterables.<locals>.coordinate at ...>, <generator object transpose_finite_iterables.<locals>.coordinate at ...>)
>>> next(result[0])
0
>>> next(result[0])
1

Synthesis

Now we can define general function for working with iterables of iterables ones of which are finite and another ones are potentially infinite using functools.singledispatch decorator like

from collections import (abc,
                         deque)
from functools import singledispatch


@singledispatch
def transpose(object_):
    """
    Transposes given object.
    """
    raise TypeError('Unsupported object type: {type}.'
                    .format(type=type))


@transpose.register(abc.Iterable)
def transpose_finite_iterables(object_):
    """
    Transposes given iterable of finite iterables.
    """
    iterator = iter(object_)
    try:
        first_elements = next(iterator)
    except StopIteration:
        return ()
    queues = [deque([element])
              for element in first_elements]

    def coordinate(queue):
        while True:
            if not queue:
                try:
                    elements = next(iterator)
                except StopIteration:
                    return
                for sub_queue, element in zip(queues, elements):
                    sub_queue.append(element)
            yield queue.popleft()

    return tuple(map(coordinate, queues))


def transpose_finite_iterable(object_):
    """
    Transposes given finite iterable of iterables.
    """
    yield from zip(*object_)

try:
    transpose.register(abc.Collection, transpose_finite_iterable)
except AttributeError:
    # Python3.5-
    transpose.register(abc.Mapping, transpose_finite_iterable)
    transpose.register(abc.Sequence, transpose_finite_iterable)
    transpose.register(abc.Set, transpose_finite_iterable)

which can be considered as its own inverse (mathematicians call this kind of functions “involutions”) in class of binary operators over finite non-empty iterables.


As a bonus of singledispatching we can handle numpy arrays like

import numpy as np
...
transpose.register(np.ndarray, np.transpose)

and then use it like

>>> array = np.arange(4).reshape((2,2))
>>> array
array([[0, 1],
       [2, 3]])
>>> transpose(array)
array([[0, 2],
       [1, 3]])

Note

Since transpose returns iterators and if someone wants to have a tuple of lists like in OP — this can be made additionally with map built-in function like

>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple(map(list, transpose(original)))
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])

Advertisement

I’ve added generalized solution to lz package from 0.5.0 version which can be used like

>>> from lz.transposition import transpose
>>> list(map(tuple, transpose(zip(range(10), range(10, 20)))))
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)]

P.S.

There is no solution (at least obvious) for handling potentially infinite iterable of potentially infinite iterables, but this case is less common though.

Answered By: Azat Ibrakov

Consider using more_itertools.unzip:

>>> from more_itertools import unzip
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> [list(x) for x in unzip(original)]
[['a', 'b', 'c', 'd'], [1, 2, 3, 4]]     
Answered By: Neil G

While numpy arrays and pandas may be preferrable, this function imitates the behavior of zip(*args) when called as unzip(args).

Allows for generators, like the result from zip in Python 3, to be passed as args as it iterates through values.

def unzip(items, cls=list, ocls=tuple):
    """Zip function in reverse.

    :param items: Zipped-like iterable.
    :type  items: iterable

    :param cls: Container factory. Callable that returns iterable containers,
        with a callable append attribute, to store the unzipped items. Defaults
        to ``list``.
    :type  cls: callable, optional

    :param ocls: Outer container factory. Callable that returns iterable
        containers. with a callable append attribute, to store the inner
        containers (see ``cls``). Defaults to ``tuple``.
    :type  ocls: callable, optional

    :returns: Unzipped items in instances returned from ``cls``, in an instance
        returned from ``ocls``.
    """
    # iter() will return the same iterator passed to it whenever possible.
    items = iter(items)

    try:
        i = next(items)
    except StopIteration:
        return ocls()

    unzipped = ocls(cls([v]) for v in i)

    for i in items:
        for c, v in zip(unzipped, i):
            c.append(v)

    return unzipped

To use list cointainers, simply run unzip(zipped), as

unzip(zip(["a","b","c"],[1,2,3])) == (["a","b","c"],[1,2,3])

To use deques, or other any container sporting append, pass a factory function.

from collections import deque

unzip([("a",1),("b",2)], deque, list) == [deque(["a","b"]),deque([1,2])]

(Decorate cls and/or main_cls to micro manage container initialization, as briefly shown in the final assert statement above.)

Answered By: Trasp

Here’s a simple one-line answer that produces the desired output:

original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
list(zip(*original))
# [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
Answered By: mkearney

Just to summarize:

# data
a = ('a', 'b', 'c', 'd')
b = (1, 2, 3, 4)

# forward
zipped = zip(a, b)  # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]

# reverse
a_, b_ = zip(*zipped)

# verify
assert a == a_
assert b == b_
Answered By: djvg
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.