Why does x,y = zip(*zip(a,b)) work in Python?

Question:

OK I love Python’s zip() function. Use it all the time, it’s brilliant. Every now and again I want to do the opposite of zip(), think “I used to know how to do that”, then google python unzip, then remember that one uses this magical * to unzip a zipped list of tuples. Like this:

x = [1,2,3]
y = [4,5,6]
zipped = zip(x,y)
unzipped_x, unzipped_y = zip(*zipped)
unzipped_x
    Out[30]: (1, 2, 3)
unzipped_y
    Out[31]: (4, 5, 6)

What on earth is going on? What is that magical asterisk doing? Where else can it be applied and what other amazing awesome things in Python are so mysterious and hard to google?

Asked By: Mike Dewar

||

Answers:

The asterisk performs apply (as it’s known in Lisp and Scheme). Basically, it takes your list, and calls the function with that list’s contents as arguments.

Answered By: Chris Jester-Young

The asterisk in Python is documented in the Python tutorial, under Unpacking Argument Lists.

Answered By: Daniel Stutzbach

It’s also useful for multiple args:

def foo(*args):
  print args

foo(1, 2, 3) # (1, 2, 3)

# also legal
t = (1, 2, 3)
foo(*t) # (1, 2, 3)

And, you can use double asterisk for keyword arguments and dictionaries:

def foo(**kwargs):
   print kwargs

foo(a=1, b=2) # {'a': 1, 'b': 2}

# also legal
d = {"a": 1, "b": 2}
foo(**d) # {'a': 1, 'b': 2}

And of course, you can combine these:

def foo(*args, **kwargs):
   print args, kwargs

foo(1, 2, a=3, b=4) # (1, 2) {'a': 3, 'b': 4}

Pretty neat and useful stuff.

Answered By: bcherry

It doesn’t always work:

>>> x = []
>>> y = []
>>> zipped = zip(x, y)
>>> unzipped_x, unzipped_y = zip(*zipped)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: need more than 0 values to unpack

Oops! I think it needs a skull to scare it into working:

>>> unzipped_x, unzipped_y = zip(*zipped) or ([], [])
>>> unzipped_x
[]
>>> unzipped_y
[]

In python3 I think you need

>>> unzipped_x, unzipped_y = tuple(zip(*zipped)) or ([], [])

since zip now returns a generator function which is not False-y.

Answered By: BenAnhalt

Addendum to @bcherry’s answer:

>>> def f(a2,a1):
...  print a2, a1
... 
>>> d = {'a1': 111, 'a2': 222}
>>> f(**d)
222 111

So it works not just with keyword arguments (in this strict sense), but with named arguments too (aka positional arguments).

Answered By: Evgeni Sergeev

I’m extremely new to Python so this just recently tripped me up, but it had to do more with how the example was presented and what was emphasized.

What gave me problems with understanding the zip example was the asymmetry in the handling of the zip call return value(s). That is, when zip is called the first time, the return value is assigned to a single variable, thereby creating a list reference (containing the created tuple list). In the second call, it’s leveraging Python’s ability to automatically unpack a list (or collection?) return value into multiple variable references, each reference being the individual tuple. If someone isn’t familiar with how that works in Python, it makes it easier to get lost as to what’s actually happening.

>>> x = [1, 2, 3]
>>> y = "abc"
>>> zipped = zip(x, y)
>>> zipped
[(1, 'a'), (2, 'b'), (3, 'c')]
>>> z1, z2, z3 = zip(x, y)
>>> z1
(1, 'a')
>>> z2
(2, 'b')
>>> z3
(3, 'c')
>>> rezipped = zip(*zipped)
>>> rezipped
[(1, 2, 3), ('a', 'b', 'c')]
>>> rezipped2 = zip(z1, z2, z3)
>>> rezipped == rezipped2
True
Answered By: user3447701

(x, y) == tuple(zip(*zip(x,y))) is true if and only if the two following statements are true:

  • x and y have the same length
  • x and y are tuples

One good way to understand what’s going on is to print at each step:

x = [1, 2, 3]
y = ["a", "b", "c", "d"]

print("1) x, y = ", x, y)
print("2) zip(x, y) = ", list(zip(x, y)))
print("3) *zip(x, y) = ", *zip(x, y))
print("4) zip(*zip(x,y)) = ", list(zip(*zip(x,y))))

Which outputs:

1) x, y =            [1, 2, 3] ['a', 'b', 'c', 'd']
2) zip(x, y) =       [(1, 'a'), (2, 'b'), (3, 'c')]
3) *zip(x, y) =       (1, 'a')  (2, 'b')  (3, 'c')
4) zip(*zip(x,y)) =  [(1, 2, 3), ('a', 'b', 'c')]

Basically this is what happens:

  1. Items from x and y are paired according to their respective indexes.
  2. Pairs are unpacked to 3 different objects (tuples)
  3. Pairs are passed to zip, which will again, pair every items based on indexes:
    • first items from all inputs are paired: (1, 2, 3)
    • second items from all inputs are paired: ('a', 'b', 'c')

Now you can understand why (x, y) == tuple(zip(*zip(x,y))) is false in this case:

  • since y is longer than x, the first zip operation removed the extra item from y (as it couldn’t be paired), this change is obviously repercuted on the second zipping operation
  • types differ, at start we had two lists, now we have two tuples as zip does pair items in tuples and not in lists

If you’re not 100% certain to understand how zip work, I wrote an answer to this question here: Unzipping and the * operator

Answered By: cglacet
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.