pythonic way of reusing boilerplate loop structures without calling if-clause n times

Question:

I inherited a legacy codebase with lots of nested for loops looking something like:

def func(infile, some_other_data, outfile, status_variable):
    with open(infile, 'r') as f:
        with open(outfile, 'w') as outf:
            for line in f:
                # parse line
                for element in some_other_data:
                    standard_function(line, element)
                    if status_variable == 'status_A':
                        function_A(line, element)
                    elif status_variable == 'status_B':
                        function_B(line, element)
                    # handle other possible status variables
                    outf.write(new_line)

This code is performance relevant. To speed it up (in addition to other changes) I want to get rid of all the if-clauses being called n*m times and testing showed that this indeed gives some 10% improvement.

To do this, I simply copied and modified the main loop function for every possible status variable and called the different functions accordingly. This effectively moved the if-clauses outside of the loops. However, it is very ugly and made the library 4x as large.

Is there a (fairly) simple pythonic way of handling such cases where I want to reuse boilerplate loops and just change what is done with each iteration WITHOUT handling conditionals every time?

I have been playing around with decorators dynamically returning the loop function calling different subfunctions depending on the status variable but the end results looked horrible from a readability perspective. I am by no means a python expert so I might overlook some handy higher-level features that could be helpful here.

Any advice is highly appreciated.

Asked By: zeawoas

||

Answers:

If there is a straight correspondence between status_variable -> function_name and also all the calls are a regular: function(line, element) you can pass in the function:

def func(infile, some_other_data, outfile, function_from_status_variable):
    with open(infile, 'r') as f:
        with open(outfile, 'w') as outf:
            for line in f:
                # parse line
                for element in some_other_data:
                    standard_function(line, element)

                    function_from_status_variable(line, element)

                    outf.write(new_line)

which is calculated once, thus:

def calc_function(status_variable):
    if status_variable == 'status_A':
        return function_A
    elif status_variable == 'status_B':
        return function_B
    # other tests follow, plus handle an unknown value

Finally call the functions like this:

function = calc_function(status_variable)
func(infile, some_other_data, outfile, function)

Answered By: quamrana

Ideally you would pass the function itself instead of the status variable, but as this is legacy code, one solution without changing the interface would be to setup a dictionary of functions like so:

def func(infile, some_other_data, outfile, status_variable,
         status_functions={
             'status_A': function_A,
             'status_B': function_B,
         }
        ):

    try:
        status_function = status_functions[status_variable]
    except KeyError:
        status_function = lambda line, element: None

    with open(infile, 'r') as f, open(outfile, 'w') as outf:
        for line in f:
            # parse line
            for element in some_other_data:
                standard_function(line, element)

                status_function(line, element)
                # handle other possible status variables
                outf.write(new_line)
Answered By: n-Holmes
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.