TypeError: 'zip' object is not subscriptable

Question:

I have a tagged file in the format token/tag and I try a function that returns a tuple with words from a (word,tag) list.

def text_from_tagged_ngram(ngram): 
    if type(ngram) == tuple:
        return ngram[0]
    return " ".join(zip(*ngram)[0]) # zip(*ngram)[0] returns a tuple with words from a (word,tag) list

In python 2.7 it worked well, but in python 3.4 it gives an error on the last line which says TypeError: 'zip' object is not subscriptable.
Why did it stop working? How can I fix this?

Asked By: sss

||

Answers:

In Python 2, zip returned a list. In Python 3, zip returns an iterable object. But you can make it into a list just by calling list, as in:

list(zip(...))

In this case, that would be:

list(zip(*ngram))

With a list, you can use indexing:

items = list(zip(*ngram))
...
items[0]

etc.

But if you only need the first element, then you don’t strictly need a list. You could just use next.

In this case, that would be:

next(zip(*ngram))
Answered By: khelwood

In 3.x, zip returns a special sort of iterator, not a list. The documentation explains:

zip() is lazy: The elements won’t be processed until the iterable is iterated on, e.g. by a for loop or by wrapping in a list.

This entails that it can’t be indexed, so old code that attempts to index or slice the result of a zip will fail with a TypeError. Simply passing the result to list produces a list, which can be used as it was in 2.x.

It also entails that iterating over the zip result a second time will not find any elements. Thus, if the data needs to be reused, create a list once and reuse the list – trying to create it again will make an empty list:

>>> example = zip('flying', 'circus')
>>> list(example)
[('f', 'c'), ('l', 'i'), ('y', 'r'), ('i', 'c'), ('n', 'u'), ('g', 's')]
>>> list(example)
[]

This iterator is implemented as an instance of a class…

>>> example = zip('flying', 'circus')
>>> example
<zip object at 0x7f76d8365540>
>>> type(example)
<class 'zip'>
>>> type(zip)
<class 'type'>

… which is built-in:

>>> class example(int, zip): pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: multiple bases have instance lay-out conflict
>>> # and that isn't caused by __slots__ either:
>>> zip.__slots__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'zip' has no attribute '__slots__'

(See also: TypeError: multiple bases have instance lay-out conflict, Cannot inherit from multiple classes defining __slots__?)

The key advantage of this is that it saves memory, and allows for short-circuiting when the inputs are also lazy. For example, corresponding lines of two large input text files can be zipped together and iterated, without reading the entire files into memory:

with open('foo.txt') as f, open('bar.txt') as g:
    for foo_line, bar_line in zip(f, g):
        print(f'{foo_line:.38} {bar_line:.38}')
        if foo_line == bar_line:
            print('^ found the first match ^'.center(78))
            break
Answered By: Karl Knechtel
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.