Different behaviors of a generator

Question:

I created a generator as follow

from textacy.extract.kwic import keyword_in_context
test = keyword_in_context('this is a test. another test to see how', keyword='test', window_width=5)
print(test)

# Out: <generator object keyword_in_context at 0x000001C21D033F20>

But when I tried to iterate the generator test, it didn’t work as expected: it only printed out the last item in test:

for i in test:
    print(next(test))

# Out: ('ther ', 'test', ' to s')

However, this raw generator worked properly:

for i in keyword_in_context('this is a test. another test see how', keyword='test', window_width=5):
    print(i)

# Out:
# ('is a ', 'test', '. ano')
# ('ther ', 'test', ' see ')

My questions are:

  1. Why the first for loop didn’t work?
  2. Why couldn’t I use items() or iteritems() (which resulted in AttributeError: 'generator' object has no attribute 'items'/'iteritems)
  3. What is the best way to extract all items from this generator?
Asked By: Nemo

||

Answers:

You have this:

for i in test:
    print(next(test))

Here, you’re exhausting the iterator twice, once with for i in test and once with next(test).

You want:

for result in test:
    print(result)

The bit that works, basically does the same in one statement:

for i in keyword_in_context('this is a test. another test see how', keyword='test', window_width=5):
    print(i)

vs:

test = keyword_in_context('this is a test. another test to see how', keyword='test', window_width=5)
for i in test:
    print(i)

You further ask:

  • Why couldn’t I use items() or iteritems() (which resulted in AttributeError: ‘generator’ object has no attribute ‘items’/’iteritems)
  • What is the best way to extract all items from this generator?

The answers respectively:

  • Generators don’t have .items() or .iteritems(); .items() would not make sense, since the generator has yet to generate its contents; if you need it, you could list(the_generator) for example. And .iteritems() would be like doing this: iter(list(the_generator)), basically giving you almost the same you already had (there’s some technical differences that probably don’t matter to you at this point).
  • The best way to extract all items from the generator depends on what you need them for. If you need all the items in one place and want to manipulate the collection, just list(the_generator) works. But for a generator that has a large, or unknown number of results, doing for item in the_generator: ... is the right approach.

It’s important to keep in mind that some generators are infinite, or depend on external resources:

def natural_numbers():
    n = 0
    while True:
        n += 1
        yield n


def pages(some_url):
    code = 200
    while code == 200:
        content, code = some_html_request(some_url)
        yield content

The latter example uses a fictional function that might return the content returned by the request, as well as the return code of the request, and you might not know how many responses it would keep going for.

Something like list(natural_numbers()) would just hang your computer, or crash, running out of memory. And something like list(pages('https://my.web/service')) might run for a very long time, collecting many results, while your computer could have been processing them one at a time, while retrieving the next one.

Answered By: Grismar
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.