Why I am getting RuntimeError: generator raised StopIteration? and how to solve it?

Question:

I am making Bigrams of the tokens stored in list docToken.

print(docToken[520])

Output: [‘sleepy’, ‘account’, ‘just’, ‘man’, ‘tired’, ‘twitter’, ‘case’,
‘romney’, ‘candidate’, ‘looks’]

list(nltk.bigrams(docToken[520]))

Output: [(‘sleepy’, ‘account’), (‘account’, ‘just’), (‘just’, ‘man’),
(‘man’, ‘tired’), (‘tired’, ‘twitter’), (‘twitter’, ‘case’),
(‘case’, ‘romney’), (‘romney’, ‘candidate’), (‘candidate’, ‘looks’)]

and when i’m using nltk.bigrams(docToken[i]) in a loop i’m getting following error on the range>=1000:

bigram=[]
for i in range(5000):
    ls=list(nltk.bigrams(docToken[i]))
    for j in ls:
        bigram.append(list(j))

it’s working just fine when the range(500) in the first loop but when the Range is 1000 or more it is giving me following error:

StopIteration                             Traceback (most recent call last) 
~Anaconda3libsite-packagesnltkutil.py in ngrams(sequence, n, pad_left, 
  pad_right, left_pad_symbol, right_pad_symbol)
        467     while n > 1:
    --> 468         history.append(next(sequence))
        469         n -= 1

StopIteration: 

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-76-8982951528bd> in <module>()
      1 bigram=[]
      2 for i in range(5000):
----> 3     ls=list(nltk.bigrams(docToken[i]))
      4     for j in ls:
      5         bigram.append(list(j))

~Anaconda3libsite-packagesnltkutil.py in bigrams(sequence, **kwargs)
    489     """
    490 
--> 491     for item in ngrams(sequence, 2, **kwargs):
    492         yield item
    493 

RuntimeError: generator raised StopIteration
Asked By: Hassaan Saleem

||

Answers:

I was not able to resolve this error. Not sure why nltk.bigrams(docToken[i]) is generating this but I was able to create bigrams by using the following code.

bigram={}
for i in range(size):
    ls=[]
    for j in range(len(docToken[i])-1):
        for k in range(j,len(docToken[i])-1):
            ls.append([docToken[i][j],docToken[i][k+1]])

    bigram[i]=ls
Answered By: Hassaan Saleem

I fixed this by upgrading nltk from 3.3 -> 3.4

Do simple:

pip install nltk==3.4
Answered By: Nishtha

I too faced the same error. One possible reason can be that one of the elements in docToken is an empty list.

For example, the following code throws the same error when i=2 as the second element is an empty list.

from nltk import bigrams
docToken= [['the', 'wildlings', 'are', 'dead'], [], ['do', 'the', 'dead', 'frighten', 'you', 'ser', 'waymar']]
for i in range(3):
    print (i)
    print (list(nltk.bigrams(docToken[i])))

Output:

0
[('the', 'wildlings'), ('wildlings', 'are'), ('are', 'dead')]
1
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
~AppDataLocalContinuumanaconda3libsite-packagesnltkutil.py in ngrams(sequence, n, pad_left, pad_right, left_pad_symbol, right_pad_symbol)
    467     while n > 1:
--> 468         history.append(next(sequence))
    469         n -= 1

StopIteration: 

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-58-91f35cae32ed> in <module>
      2 for i in range(3):
      3     print (i)
----> 4     list(nltk.bigrams(docToken[i]))

~AppDataLocalContinuumanaconda3libsite-packagesnltkutil.py in bigrams(sequence, **kwargs)
    489     """
    490 
--> 491     for item in ngrams(sequence, 2, **kwargs):
    492         yield item
    493 

RuntimeError: generator raised StopIteration

You can filter out the empty lists from docToken and then create bigrams:

docToken= [['the', 'wildlings', 'are', 'dead'], [], ['do', 'the', 'dead', 'frighten', 'you', 'ser', 'waymar']]
docToken = [x for x in docToken if x]
bigram = []
for i in range(len(docToken)):
    bigram.append(["_".join(w) for w in  bigrams(docToken[i])])
bigram

Output:

[['the_wildlings', 'wildlings_are', 'are_dead'],
 ['do_the',
  'the_dead',
  'dead_frighten',
  'frighten_you',
  'you_ser',
  'ser_waymar']]

Another possible reason can be that you’re using nltk 3.3 in python 3.7.

Please use nltk 3.4, it’s the first version with Python 3.7 support, your issue should be resolved in this version.

Please see here.

Answered By: Nutan

First uninstall the current version of NLTK

pip uninstall nltk==3.2.5

Then install the latest version of NLTK

pip install nltk==3.6.2

And then check NLTK version, it should be 3.6.2

import nltk
print('The nltk version is {}.'.format(nltk.__version__))

This will fix the problem.

Answered By: Anish Nama

To complete @Leonard answer, I’ve solved it by uninstalling and reinstalling simply with:

pip uninstall nltk
pip install nltk

Don’t give version numbers, by default it uninstalls the one you have and reinstall the latest one.

Answered By: xtof54
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.