Unzipping and the * operator

Question:

The python docs gives this code as the reverse operation of zip:

>>> x2, y2 = zip(*zipped)

In particular

zip() in conjunction with the * operator can be used to unzip a list.

Can someone explain to me how the * operator works in this case? As far as I understand, * is a binary operator and can be used for multiplication or shallow copy…neither of which seems to be the case here.

Asked By: Leah Xue

||

Answers:

zip(*zipped) means “feed each element of zipped as an argument to zip“. zip is similar to transposing a matrix in that doing it again will leave you back where you started.

>>> a = [(1, 2, 3), (4, 5, 6)]
>>> b = zip(*a)
>>> b
[(1, 4), (2, 5), (3, 6)]
>>> zip(*b)
[(1, 2, 3), (4, 5, 6)]
Answered By: hammar

When used like this, the * (asterisk, also know in some circles as the “splat” operator) is a signal to unpack arguments from a list. See http://docs.python.org/tutorial/controlflow.html#unpacking-argument-lists for a more complete definition with examples.

Answered By: Philip Southam

Although hammar’s answer explains how the reversing works in the case of the zip() function, it may be useful to look at argument unpacking in a more general sense. Let’s say we have a simple function which takes some arguments:

>>> def do_something(arg1, arg2, arg3):
...     print 'arg1: %s' % arg1
...     print 'arg2: %s' % arg2
...     print 'arg3: %s' % arg3
... 
>>> do_something(1, 2, 3)
arg1: 1
arg2: 2
arg3: 3

Instead of directly specifying the arguments, we can create a list (or tuple for that matter) to hold them, and then tell Python to unpack that list and use its contents as the arguments to the function:

>>> arguments = [42, 'insert value here', 3.14]
>>> do_something(*arguments)
arg1: 42
arg2: insert value here
arg3: 3.14

This behaves as normal if you don’t have enough arguments (or too many):

>>> arguments = [42, 'insert value here']
>>> do_something(*arguments)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/blair/<ipython console> in <module>()

TypeError: do_something() takes exactly 3 arguments (2 given)

You can use the same construct when defining a function to accept any number of positional arguments. They are given to your function as a tuple:

>>> def show_args(*args):
...     for index, value in enumerate(args):
...         print 'Argument %d: %s' % (index, value)
...
>>> show_args(1, 2, 3)
Argument 0: 1
Argument 1: 2
Argument 2: 3

And of course you can combine the two techniques:

>>> show_args(*arguments)
Argument 0: 42
Argument 1: insert value here

You can do a similar thing with keyword arguments, using a double asterix (**) and a dictionary:

>>> def show_kwargs(**kwargs):
...     for arg, value in kwargs.items():
...         print '%s = %s' % (arg, value)
...
>>> show_kwargs(age=24, name='Blair')
age = 24
name = Blair

And, of course, you can pass keyword arguments through a dictionary:

>>> values = {'name': 'John', 'age': 17}
>>> show_kwargs(**values)
age = 17
name = John

It is perfectly acceptable to mix the two, and you can always have required arguments and optional extra arguments to a function:

>>> def mixed(required_arg, *args, **kwargs):
...     print 'Required: %s' % required_arg
...     if args:
...         print 'Extra positional arguments: %s' % str(args)
...     if kwargs:
...         print 'Extra keyword arguments: %s' % kwargs
...
>>> mixed(1)
Required: 1
>>> mixed(1, 2, 3)
Required: 1
Extra positional arguments: (2, 3)
>>> mixed(1, 2, 3, test=True)
Required: 1
Extra positional arguments: (2, 3)
Extra keyword arguments: {'test': True}
>>> args = (2, 3, 4)
>>> kwargs = {'test': True, 'func': min}
>>> mixed(*args, **kwargs)
Required: 2
Extra positional arguments: (3, 4)
Extra keyword arguments: {'test': True, 'func': <built-in function min>}

If you are taking optional keyword arguments and you want to have default values, remember you are dealing with a dictionary and hence you can use its get() method with a default value to use if the key does not exist:

>>> def take_keywords(**kwargs):
...     print 'Test mode: %s' % kwargs.get('test', False)
...     print 'Combining function: %s' % kwargs.get('func', all)
... 
>>> take_keywords()
Test mode: False
Combining function: <built-in function all>
>>> take_keywords(func=any)
Test mode: False
Combining function: <built-in function any>
Answered By: Blair

I propose this one to unzip a zipped list of lists when zip is done with izip_longest:

>>> a =[2,3,4,5,6]
>>> b = [5,4,3,2]
>>> c=[1,0]]

>>>[list([val for val in k if val != None]) for k in 
                                       zip(*itertools.izip_longest(a,b,c))]

as izip_longest is appending None for lists shortest than the longest, I remove None beforehand. And I am back to the original a,b,c

[[2, 3, 4, 5, 6], [5, 4, 3, 2], [1, 0]]
Answered By: kiriloff

That’s actually pretty simple once you really understand what zip() does.

The zip function takes several arguments (all of iterable type) and pair items from these iterables according to their respective positions.

For example, say we have two arguments ranked_athletes, rewards passed to zip, the function call zip(ranked_athletes, rewards) will:

  • pair athlete that ranked first (position i=0) with the first/best reward (position i=0)
  • it will move the the next element, i=1
  • pair the 2nd athlete with its reward, the 2nd from reward.

This will be repeated until there is either no more athlete or reward left. For example if we take the 100m at the 2016 olympics and zip the rewards we have:

ranked_athletes = ["Usain Bolt", "Justin Gatlin", "Andre De Grasse", "Yohan Blake"]
rewards = ["Gold medal", "Silver medal", "Bronze medal"]
zip(ranked_athletes, rewards)

Will return an iterator over the following tuples (pairs):

('Usain Bolt', 'Gold medal')
('Justin Gatlin', 'Silver medal')
('Andre De Grasse', 'Bronze medal')

Notice how Yohan Blake has no reward (because there are no more reward left in the rewards list).

The * operator allows to unpack a list, for example the list [1, 2] unpacks to 1, 2. It basically transform one object into many (as many as the size of the list). You can read more about this operator(s) here.

So if we combine these two, zip(*x) actually means: take this list of objects, unpack it to many objects and pair items from all these objects according to their indexes. It only make sense if the objects are iterable (like lists for example) otherwise the notion of index doesn’t really make sense.

Here is what it looks like if you do it step by step:

>>> print(x)              # x is a list of lists 
[[1, 2, 3], ['a', 'b', 'c', 'd']]

>>> print(*x)             # unpack x
[1, 2, 3]  ['a', 'b', 'c', 'd']

>>> print(list(zip(*x)))  # And pair items from the resulting lists
[(1, 'a'), (2, 'b'), (3, 'c')]

Note that in this case, if we call print(list(zip(x))) we will just pair items from x (which are 2 lists) with nothing (as there are no other iterable to pair them with):

[  ([1, 2, 3],    ),  (['a', 'b', 'c', 'd'],    )]
               ^                              ^
    [1, 2, 3] is paired with nothing          |
                                              |
                        same for the 2nd item from x: ['a', 'b', 'c', 'd']

Another good way to understand how zip works is by implementing your own version, here is something that will do more or less the same job as zip but limited to the case of two lists (instead of many iterables):

def zip_two_lists(A, B):
    shortest_list_size = min(len(A), len(B))
    # We create empty pairs
    pairs = [tuple() for _ in range(shortest_list_size)]
    # And fill them with items from each iterable 
    # according to their the items index:
    for index in range(shortest_list_size):
        pairs[index] = (A[index], B[index])
    return pairs

print(zip_two_lists(*x))
# Outputs: [(1, 'a'), (2, 'b'), (3, 'c')]

Notice how I didn’t call print(list(zip_two_lists(*x))) that’s because this function unlike the real zip isn’t a generator (a function that constructs an iterator), but instead we create a list in memory. Therefore this function is not as good, you can find a better approximation to the real zip in Python’s documentation. It’s often a good idea to read these code equivalences you have all around this documentation, it’s a good way to understand what a function does without any ambiguity.

Answered By: cglacet
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.