TypeError: 'zip' object is not subscriptable
Question:
I have a tagged file in the format token/tag and I try a function that returns a tuple with words from a (word,tag) list.
def text_from_tagged_ngram(ngram):
if type(ngram) == tuple:
return ngram[0]
return " ".join(zip(*ngram)[0]) # zip(*ngram)[0] returns a tuple with words from a (word,tag) list
In python 2.7 it worked well, but in python 3.4 it gives an error on the last line which says TypeError: 'zip' object is not subscriptable
.
Why did it stop working? How can I fix this?
Answers:
In Python 2, zip
returned a list. In Python 3, zip
returns an iterable object. But you can make it into a list just by calling list
, as in:
list(zip(...))
In this case, that would be:
list(zip(*ngram))
With a list, you can use indexing:
items = list(zip(*ngram))
...
items[0]
etc.
But if you only need the first element, then you don’t strictly need a list. You could just use next
.
In this case, that would be:
next(zip(*ngram))
In 3.x, zip
returns a special sort of iterator, not a list. The documentation explains:
zip()
is lazy: The elements won’t be processed until the iterable is iterated on, e.g. by a for loop or by wrapping in a list
.
This entails that it can’t be indexed, so old code that attempts to index or slice the result of a zip
will fail with a TypeError
. Simply passing the result to list
produces a list, which can be used as it was in 2.x.
It also entails that iterating over the zip
result a second time will not find any elements. Thus, if the data needs to be reused, create a list once and reuse the list – trying to create it again will make an empty list:
>>> example = zip('flying', 'circus')
>>> list(example)
[('f', 'c'), ('l', 'i'), ('y', 'r'), ('i', 'c'), ('n', 'u'), ('g', 's')]
>>> list(example)
[]
This iterator is implemented as an instance of a class…
>>> example = zip('flying', 'circus')
>>> example
<zip object at 0x7f76d8365540>
>>> type(example)
<class 'zip'>
>>> type(zip)
<class 'type'>
… which is built-in:
>>> class example(int, zip): pass
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: multiple bases have instance lay-out conflict
>>> # and that isn't caused by __slots__ either:
>>> zip.__slots__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'zip' has no attribute '__slots__'
(See also: TypeError: multiple bases have instance lay-out conflict, Cannot inherit from multiple classes defining __slots__?)
The key advantage of this is that it saves memory, and allows for short-circuiting when the inputs are also lazy. For example, corresponding lines of two large input text files can be zip
ped together and iterated, without reading the entire files into memory:
with open('foo.txt') as f, open('bar.txt') as g:
for foo_line, bar_line in zip(f, g):
print(f'{foo_line:.38} {bar_line:.38}')
if foo_line == bar_line:
print('^ found the first match ^'.center(78))
break
I have a tagged file in the format token/tag and I try a function that returns a tuple with words from a (word,tag) list.
def text_from_tagged_ngram(ngram):
if type(ngram) == tuple:
return ngram[0]
return " ".join(zip(*ngram)[0]) # zip(*ngram)[0] returns a tuple with words from a (word,tag) list
In python 2.7 it worked well, but in python 3.4 it gives an error on the last line which says TypeError: 'zip' object is not subscriptable
.
Why did it stop working? How can I fix this?
In Python 2, zip
returned a list. In Python 3, zip
returns an iterable object. But you can make it into a list just by calling list
, as in:
list(zip(...))
In this case, that would be:
list(zip(*ngram))
With a list, you can use indexing:
items = list(zip(*ngram))
...
items[0]
etc.
But if you only need the first element, then you don’t strictly need a list. You could just use next
.
In this case, that would be:
next(zip(*ngram))
In 3.x, zip
returns a special sort of iterator, not a list. The documentation explains:
zip()
is lazy: The elements won’t be processed until the iterable is iterated on, e.g. by a for loop or by wrapping in alist
.
This entails that it can’t be indexed, so old code that attempts to index or slice the result of a zip
will fail with a TypeError
. Simply passing the result to list
produces a list, which can be used as it was in 2.x.
It also entails that iterating over the zip
result a second time will not find any elements. Thus, if the data needs to be reused, create a list once and reuse the list – trying to create it again will make an empty list:
>>> example = zip('flying', 'circus')
>>> list(example)
[('f', 'c'), ('l', 'i'), ('y', 'r'), ('i', 'c'), ('n', 'u'), ('g', 's')]
>>> list(example)
[]
This iterator is implemented as an instance of a class…
>>> example = zip('flying', 'circus')
>>> example
<zip object at 0x7f76d8365540>
>>> type(example)
<class 'zip'>
>>> type(zip)
<class 'type'>
… which is built-in:
>>> class example(int, zip): pass
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: multiple bases have instance lay-out conflict
>>> # and that isn't caused by __slots__ either:
>>> zip.__slots__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'zip' has no attribute '__slots__'
(See also: TypeError: multiple bases have instance lay-out conflict, Cannot inherit from multiple classes defining __slots__?)
The key advantage of this is that it saves memory, and allows for short-circuiting when the inputs are also lazy. For example, corresponding lines of two large input text files can be zip
ped together and iterated, without reading the entire files into memory:
with open('foo.txt') as f, open('bar.txt') as g:
for foo_line, bar_line in zip(f, g):
print(f'{foo_line:.38} {bar_line:.38}')
if foo_line == bar_line:
print('^ found the first match ^'.center(78))
break