convert normal function to generator and keeping code structure

Question:

Generators are elegant and memory-efficient, so I’d like to convert normal functions producing sequences to generators. But I haven’t found an easy conversion method, by easy I mean it is best to be done automatically by some tools, maybe simple as string or regex replacement(return to yield, for example). If the conversion cannot always be easy, refactoring may introduce bugs. Of course unit testing can capture some bugs, but if the normal function is already fully tested, I doubt the necessity writing extra tests for the generator.

Let me give an example, normal function normalize accepts a src parameter, if src is a list, then return the list, otherwise return one list with src as sigle item. Here for simplicity we suppose src is provided by the caller as either a list or a simple scalar.

test

f = normalize # replace with other conversions
assert list(f(2)) == [2]
assert list(f([1, 2])) == [1, 2]

normal function

def normalize(src):
    if isinstance(src, list):
        return src
    return [src]

Now I convert it to generator, adding one else block, test passes:

generator which introducs new code blocks

def normalize_generator(src):
    if isinstance(src, list):
        for x in src:
            yield x
    else:
        yield src

What I want is keeping the code structure of normalize, only replacing some keywords, like following, of course test will fail.

generator keeping code structure, test fails:

def normalize_generator_no_else(src):
    if isinstance(src, list):
        for x in src:
            yield x
    yield src

test result fail: left is [1, 2, [1, 2]], right is [1, 2]

I basically understand this behaviour, it seems yield continues code in following lines. I’ve searched and found similar question Does python yield imply continue? but I haven’t found a solution to my question.

Asked By: Lei Yang

||

Answers:

For the specific conversion you want that "maintains" the code structure failing to work, you have to realize that yield does not equate to return, and there is no equivalent early return within a block that can be achieved simply using the yield keyword. For that you will need to insert an additional return after the yield, so the corrected generator should look like:

def normalize_generator_no_else(src):
    if isinstance(src, list):
        for x in src:
            yield x
        return  # ensure function returns early to not execute the rest of it
    yield src

This would ensure the early return within the if block of the original function also is maintained here (again, using the return inside the if block, though it must return nothing (i.e. None).

Now a word of warning: a general conversion solution that works for all cases that will achieve your goal of "memory-efficient" may be difficult as there are arbitrary ways the function can return a list, as opposed to yielding values where the effects of the yield is immediate; a function may have multiple returns in various if blocks of arbitrary local list variables that may be modified at various places, and simply yielding the items from the list will not achieve the "elegant and memory-efficient" trait you desire.

Generally, a proper rewrite of the function to yield the appropriate values at the appropriate level will need to be done, and in some cases it may not be possible if the function removes some values from the list to be returned. The function that was corrected above (along with the original) will simply keep a reference to the list of all the values in memory anyway, which defeats the goal at being "memory-efficient", though if it is meant as an intermediate function for processing a list of values then it is no worse or better, but for function that produces new values then for sure there is no general solution, it must be catered for the problem solved by the function at hand.

Answered By: metatoaster
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.