Python `yield from`, or return a generator?

Question:

I wrote this simple piece of code:

def mymap(func, *seq):
  return (func(*args) for args in zip(*seq))

Should I use the ‘return’ statement as above to return a generator, or use a ‘yield from’ instruction like this:

def mymap(func, *seq):
  yield from (func(*args) for args in zip(*seq))

and beyond the technical difference between ‘return’ and ‘yield from’, which is the better approach the in general case?

Asked By: AleMal

||

Answers:

Really it depends on the situation. yield is mainly suited to cases where you just want to iterate over the returned values and then manipulate them. return is mainly suited for when you want to store all of the values that your function has generated in memory rather than just iterate over them once. Do note that you can only iterate over a generator (what yield returns) once, there are some algorithms which this is definitely not suited for.

Answered By: Jamie C

Generators use yield, functions use return.

Generators are generally used in for loops for repeatedly iterating over the values automatically provided by a generator, but may be used also in another context, e. g. in list() function to create list – again from values automatically provided by a generator.

Functions are called to provide return value, only one value for every call.

Answered By: MarianD

The difference is that your first mymap is just a usual function,
in this case a factory which returns a generator. Everything
inside the body gets executed as soon as you call the function.

def gen_factory(func, seq):
    """Generator factory returning a generator."""
    # do stuff ... immediately when factory gets called
    print("build generator & return")
    return (func(*args) for args in seq)

The second mymap is also a factory, but it’s also a generator
itself, yielding from a self-built sub-generator inside.
Because it is a generator itself, execution of the body does
not start until the first invokation of next(generator).

def gen_generator(func, seq):
    """Generator yielding from sub-generator inside."""
    # do stuff ... first time when 'next' gets called
    print("build generator & yield")
    yield from (func(*args) for args in seq)

I think the following example will make it clearer.
We define data packages which shall be processed with functions,
bundled up in jobs we pass to the generators.

def add(a, b):
    return a + b

def sqrt(a):
    return a ** 0.5

data1 = [*zip(range(1, 5))]  # [(1,), (2,), (3,), (4,)]
data2 = [(2, 1), (3, 1), (4, 1), (5, 1)]

job1 = (sqrt, data1)
job2 = (add, data2)

Now we run the following code inside an interactive shell like IPython to
see the different behavior. gen_factory immediately prints
out, while gen_generator only does so after next() being called.

gen_fac = gen_factory(*job1)
# build generator & return <-- printed immediately
next(gen_fac)  # start
# Out: 1.0
[*gen_fac]  # deplete rest of generator
# Out: [1.4142135623730951, 1.7320508075688772, 2.0]

gen_gen = gen_generator(*job1)
next(gen_gen)  # start
# build generator & yield <-- printed with first next()
# Out: 1.0
[*gen_gen]  # deplete rest of generator
# Out: [1.4142135623730951, 1.7320508075688772, 2.0]

To give you a more reasonable use case example for a construct
like gen_generator we’ll extend it a little and make a coroutine
out of it by assigning yield to variables, so we can inject jobs
into the running generator with send().

Additionally we create a helper function which will run all tasks
inside a job and ask as for a new one upon completion.

def gen_coroutine():
    """Generator coroutine yielding from sub-generator inside."""
    # do stuff... first time when 'next' gets called
    print("receive job, build generator & yield, loop")
    while True:
        try:
            func, seq = yield "send me work ... or I quit with next next()"
        except TypeError:
            return "no job left"
        else:
            yield from (func(*args) for args in seq)


def do_job(gen, job):
    """Run all tasks in job."""
    print(gen.send(job))
    while True:
        result = next(gen)
        print(result)
        if result == "send me work ... or I quit with next next()":
            break

Now we run gen_coroutinewith our helper function do_joband two jobs.

gen_co = gen_coroutine()
next(gen_co)  # start
# receive job, build generator & yield, loop  <-- printed with first next()
# Out:'send me work ... or I quit with next next()'
do_job(gen_co, job1)  # prints out all results from job
# 1
# 1.4142135623730951
# 1.7320508075688772
# 2.0
# send me work... or I quit with next next()
do_job(gen_co, job2)  # send another job into generator
# 3
# 4
# 5
# 6
# send me work... or I quit with next next()
next(gen_co)
# Traceback ...
# StopIteration: no job left

To come back to your question which version is the better approach in general.
IMO something like gen_factory makes only sense if you need the same thing done for multiple generators you are going to create, or in cases your construction process for generators is complicated enough to justify use of a factory instead of building individual generators in place with a generator comprehension.

Note:

The description above for the gen_generator function (second mymap) states
“it is a generator itself”. That is a bit vague and technically not
really correct, but facilitates reasoning about the differences of the functions
in this tricky setup where gen_factory also returns a generator, namely that
one built by the generator comprehension inside.

In fact any function (not only those from this question with generator comprehensions inside!) with a yield inside, upon invocation, just
returns a generator object which gets constructed out of the function body.

type(gen_coroutine) # function
gen_co = gen_coroutine(); type(gen_co) # generator

So the whole action we observed above for gen_generator and gen_coroutine
takes place within these generator objects, functions with yield inside have spit out before.

Answered By: Darkonaut

The most important difference (I don’t know if yield from generator is optimized) is that the context is different for return and yield from.


[ins] In [1]: def generator():
         ...:     yield 1
         ...:     raise Exception
         ...:

[ins] In [2]: def use_generator():
         ...:     return generator()
         ...:

[ins] In [3]: def yield_generator():
         ...:     yield from generator()
         ...:

[ins] In [4]: g = use_generator()

[ins] In [5]: next(g); next(g)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-5-3d9500a8db9f> in <module>
----> 1 next(g); next(g)

<ipython-input-1-b4cc4538f589> in generator()
      1 def generator():
      2     yield 1
----> 3     raise Exception
      4

Exception:

[ins] In [6]: g = yield_generator()

[ins] In [7]: next(g); next(g)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-7-3d9500a8db9f> in <module>
----> 1 next(g); next(g)

<ipython-input-3-3ab40ecc32f5> in yield_generator()
      1 def yield_generator():
----> 2     yield from generator()
      3

<ipython-input-1-b4cc4538f589> in generator()
      1 def generator():
      2     yield 1
----> 3     raise Exception
      4

Exception:
Answered By: Apalala

The answer is: return a generator. It’s more fast:

marco@buzz:~$ python3.9 -m pyperf timeit --rigorous --affinity 3 --value 6 --loops=4096 -s '
a = range(1000)

def f1():
    for x in a:
        yield x

def f2():
    return f1()

' 'tuple(f2())'
........................................
Mean +- std dev: 72.8 us +- 5.8 us
marco@buzz:~$ python3.9 -m pyperf timeit --rigorous --affinity 3 --value 6 --loops=4096 -s '
a = range(1000)

def f1():
    for x in a:
        yield x

def f2():
    yield from f1()

' 'tuple(f2())'
........................................
WARNING: the benchmark result may be unstable
* the standard deviation (12.6 us) is 10% of the mean (121 us)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3.9 -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

Mean +- std dev: 121 us +- 13 us

If you read PEP 380, the main reason for the introduction of yield from is to use a part of the code of a generator for another generator, without having to duplicate the code or change the API:

The rationale behind most of the semantics presented above stems from
the desire to be able to refactor generator code. It should be
possible to take a section of code containing one or more yield
expressions, move it into a separate function (using the usual
techniques to deal with references to variables in the surrounding
scope, etc.), and call the new function using a yield from expression.

Source

Answered By: Marco Sulla

I prefer the version with yield from because it makes it easier to handle exceptions and context managers.

Take the example of a generator expression for the lines of a file:

def with_return(some_file):
    with open(some_file, 'rt') as f:
        return (line.strip() for line in f)

for line in with_return('/tmp/some_file.txt'):
    print(line)

The return version raises a ValueError: I/O operation on closed file. since the file is not open anymore after the return statement.

On the other hand, the yield from version works as expected:

def with_yield_from(some_file):
    with open(some_file, 'rt') as f:
        yield from (line.strip() for line in f)


for line in with_yield_from('/tmp/some_file.txt'):
    print(line)
Answered By: oLen
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.