pythonic way of reusing boilerplate loop structures without calling if-clause n times
Question:
I inherited a legacy codebase with lots of nested for loops looking something like:
def func(infile, some_other_data, outfile, status_variable):
with open(infile, 'r') as f:
with open(outfile, 'w') as outf:
for line in f:
# parse line
for element in some_other_data:
standard_function(line, element)
if status_variable == 'status_A':
function_A(line, element)
elif status_variable == 'status_B':
function_B(line, element)
# handle other possible status variables
outf.write(new_line)
This code is performance relevant. To speed it up (in addition to other changes) I want to get rid of all the if-clauses being called n*m times and testing showed that this indeed gives some 10% improvement.
To do this, I simply copied and modified the main loop function for every possible status variable and called the different functions accordingly. This effectively moved the if-clauses outside of the loops. However, it is very ugly and made the library 4x as large.
Is there a (fairly) simple pythonic way of handling such cases where I want to reuse boilerplate loops and just change what is done with each iteration WITHOUT handling conditionals every time?
I have been playing around with decorators dynamically returning the loop function calling different subfunctions depending on the status variable but the end results looked horrible from a readability perspective. I am by no means a python expert so I might overlook some handy higher-level features that could be helpful here.
Any advice is highly appreciated.
Answers:
If there is a straight correspondence between status_variable
-> function_name and also all the calls are a regular: function(line, element)
you can pass in the function:
def func(infile, some_other_data, outfile, function_from_status_variable):
with open(infile, 'r') as f:
with open(outfile, 'w') as outf:
for line in f:
# parse line
for element in some_other_data:
standard_function(line, element)
function_from_status_variable(line, element)
outf.write(new_line)
which is calculated once, thus:
def calc_function(status_variable):
if status_variable == 'status_A':
return function_A
elif status_variable == 'status_B':
return function_B
# other tests follow, plus handle an unknown value
Finally call the functions like this:
function = calc_function(status_variable)
func(infile, some_other_data, outfile, function)
Ideally you would pass the function itself instead of the status variable, but as this is legacy code, one solution without changing the interface would be to setup a dictionary of functions like so:
def func(infile, some_other_data, outfile, status_variable,
status_functions={
'status_A': function_A,
'status_B': function_B,
}
):
try:
status_function = status_functions[status_variable]
except KeyError:
status_function = lambda line, element: None
with open(infile, 'r') as f, open(outfile, 'w') as outf:
for line in f:
# parse line
for element in some_other_data:
standard_function(line, element)
status_function(line, element)
# handle other possible status variables
outf.write(new_line)
I inherited a legacy codebase with lots of nested for loops looking something like:
def func(infile, some_other_data, outfile, status_variable):
with open(infile, 'r') as f:
with open(outfile, 'w') as outf:
for line in f:
# parse line
for element in some_other_data:
standard_function(line, element)
if status_variable == 'status_A':
function_A(line, element)
elif status_variable == 'status_B':
function_B(line, element)
# handle other possible status variables
outf.write(new_line)
This code is performance relevant. To speed it up (in addition to other changes) I want to get rid of all the if-clauses being called n*m times and testing showed that this indeed gives some 10% improvement.
To do this, I simply copied and modified the main loop function for every possible status variable and called the different functions accordingly. This effectively moved the if-clauses outside of the loops. However, it is very ugly and made the library 4x as large.
Is there a (fairly) simple pythonic way of handling such cases where I want to reuse boilerplate loops and just change what is done with each iteration WITHOUT handling conditionals every time?
I have been playing around with decorators dynamically returning the loop function calling different subfunctions depending on the status variable but the end results looked horrible from a readability perspective. I am by no means a python expert so I might overlook some handy higher-level features that could be helpful here.
Any advice is highly appreciated.
If there is a straight correspondence between status_variable
-> function_name and also all the calls are a regular: function(line, element)
you can pass in the function:
def func(infile, some_other_data, outfile, function_from_status_variable):
with open(infile, 'r') as f:
with open(outfile, 'w') as outf:
for line in f:
# parse line
for element in some_other_data:
standard_function(line, element)
function_from_status_variable(line, element)
outf.write(new_line)
which is calculated once, thus:
def calc_function(status_variable):
if status_variable == 'status_A':
return function_A
elif status_variable == 'status_B':
return function_B
# other tests follow, plus handle an unknown value
Finally call the functions like this:
function = calc_function(status_variable)
func(infile, some_other_data, outfile, function)
Ideally you would pass the function itself instead of the status variable, but as this is legacy code, one solution without changing the interface would be to setup a dictionary of functions like so:
def func(infile, some_other_data, outfile, status_variable,
status_functions={
'status_A': function_A,
'status_B': function_B,
}
):
try:
status_function = status_functions[status_variable]
except KeyError:
status_function = lambda line, element: None
with open(infile, 'r') as f, open(outfile, 'w') as outf:
for line in f:
# parse line
for element in some_other_data:
standard_function(line, element)
status_function(line, element)
# handle other possible status variables
outf.write(new_line)