Is it pythonic to import inside functions?
Question:
PEP 8 says:
- Imports are always put at the top of the file, just after any module
comments and docstrings, and before module globals and constants.
On occation, I violate PEP 8. Some times I import stuff inside functions. As a general rule, I do this if there is an import that is only used within a single function.
Any opinions?
EDIT (the reason I feel importing in functions can be a good idea):
Main reason: It can make the code clearer.
- When looking at the code of a function I might ask myself: “What is function/class xxx?” (xxx being used inside the function). If I have all my imports at the top of the module, I have to go look there to determine what xxx is. This is more of an issue when using
from m import xxx
. Seeing m.xxx
in the function probably tells me more. Depending on what m
is: Is it a well-known top-level module/package (import m
)? Or is it a sub-module/package (from a.b.c import m
)?
- In some cases having that extra information (“What is xxx?”) close to where xxx is used can make the function easier to understand.
Answers:
As long as it’s import
and not from x import *
, you should put them at the top. It adds just one name to the global namespace, and you stick to PEP 8. Plus, if you later need it somewhere else, you don’t have to move anything around.
It’s no big deal, but since there’s almost no difference I’d suggest doing what PEP 8 says.
There are two occasions where I violate PEP 8 in this regard:
- Circular imports: module A imports module B, but something in module B needs module A (though this is often a sign that I need to refactor the modules to eliminate the circular dependency)
- Inserting a pdb breakpoint:
import pdb; pdb.set_trace()
This is handy b/c I don’t want to put import pdb
at the top of every module I might want to debug, and it easy to remember to remove the import when I remove the breakpoint.
Outside of these two cases, it’s a good idea to put everything at the top. It makes the dependencies clearer.
One thing to bear in mind: needless imports can cause performance problems. So if this is a function that will be called frequently, you’re better off just putting the import at the top. Of course this is an optimization, so if there’s a valid case to be made that importing inside a function is more clear than importing at the top of a file, that trumps performance in most cases.
If you’re doing IronPython, I’m told that it’s better to import inside functions (since compiling code in IronPython can be slow). Thus, you may be able to get a way with importing inside functions then. But other than that, I’d argue that it’s just not worth it to fight convention.
As a general rule, I do this if there is an import that is only used within a single function.
Another point I’d like to make is that this may be a potential maintenence problem. What happens if you add a function that uses a module that was previously used by only one function? Are you going to remember to add the import to the top of the file? Or are you going to scan each and every function for imports?
FWIW, there are cases where it makes sense to import inside a function. For example, if you want to set the language in cx_Oracle, you need to set an NLS_
LANG environment variable before it is imported. Thus, you may see code like this:
import os
oracle = None
def InitializeOracle(lang):
global oracle
os.environ['NLS_LANG'] = lang
import cx_Oracle
oracle = cx_Oracle
I’ve broken this rule before for modules that are self-testing. That is, they are normally just used for support, but I define a main for them so that if you run them by themselves you can test their functionality. In that case I sometimes import getopt
and cmd
just in main, because I want it to be clear to someone reading the code that these modules have nothing to do with the normal operation of the module and are only being included for testing.
Here are the four import use cases that we use
-
import
(and from x import y
and import x as y
) at the top
-
Choices for Import. At the top.
import settings
if setting.something:
import this as foo
else:
import that as foo
-
Conditional Import. Used with JSON, XML libraries and the like. At the top.
try:
import this as foo
except ImportError:
import that as foo
-
Dynamic Import. So far, we only have one example of this.
import settings
module_stuff = {}
module= __import__( settings.some_module, module_stuff )
x = module_stuff['x']
Note that this dynamic import doesn’t bring in code, but brings in complex
data structures written in Python. It’s kind of like a pickled piece of data
except we pickled it by hand.
This is also, more-or-less, at the top of a module
Here’s what we do to make the code clearer:
-
Keep the modules short.
-
If I have all my imports at the top of the module, I have to go look there to determine what a name is. If the module is short, that’s easy to do.
-
In some cases having that extra information close to where a name is used can make the function easier to understand. If the module is short, that’s easy to do.
In the long run I think you’ll appreciate having most of your imports at the top of the file, that way you can tell at a glance how complicated your module is by what it needs to import.
If I’m adding new code to an existing file I’ll usually do the import where it’s needed and then if the code stays I’ll make things more permanent by moving the import line to the top of the file.
One other point, I prefer to get an ImportError
exception before any code is run — as a sanity check, so that’s another reason to import at the top.
I use pyChecker
to check for unused modules.
Coming from the question about loading the module twice – Why not both?
An import at the top of the script will indicate the dependencies and another import in the function with make this function more atomic, while seemingly not causing any performance disadvantage, since a consecutive import is cheap.
Have a look at the alternative approach that’s used in sqlalchemy: dependency injection:
@util.dependencies("sqlalchemy.orm.query")
def merge_result(query, *args):
#...
query.Query(...)
Notice how the imported library is declared in a decorator, and passed as an argument to the function!
This approach makes the code cleaner, and also works 4.5 times faster than an import
statement!
Benchmark: https://gist.github.com/kolypto/589e84fbcfb6312532658df2fabdb796
In modules that are both ‘normal’ modules and can be executed (i.e. have a if __name__ == '__main__':
-section), I usually import modules that are only used when executing the module inside the main section.
Example:
def really_useful_function(data):
...
def main():
from pathlib import Path
from argparse import ArgumentParser
from dataloader import load_data_from_directory
parser = ArgumentParser()
parser.add_argument('directory')
args = parser.parse_args()
data = load_data_from_directory(Path(args.directory))
print(really_useful_function(data)
if __name__ == '__main__':
main()
There’s another (probably "corner") case where it may be beneficial to import
inside rarely used functions: shorten startup time.
I hit that wall once with a rather complex program running on a small IoT server accepting commands from a serial line and performing operations, possibly very complex operations.
Placing import
statements at top of files meant to have all imports processed before server start; since import
list included jinja2
, lxml
, signxml
and other "heavy weights" (and SoC was not very powerful) this meant minutes before the first instruction was actually executed.
OTOH placing most imports in functions I was able to have the server "alive" on the serial line in seconds. Of course when the modules were actually needed I had to pay the price (Note: also this can be mitigated by spawning a background task doing import
s in idle time).
PEP 8 says:
- Imports are always put at the top of the file, just after any module
comments and docstrings, and before module globals and constants.
On occation, I violate PEP 8. Some times I import stuff inside functions. As a general rule, I do this if there is an import that is only used within a single function.
Any opinions?
EDIT (the reason I feel importing in functions can be a good idea):
Main reason: It can make the code clearer.
- When looking at the code of a function I might ask myself: “What is function/class xxx?” (xxx being used inside the function). If I have all my imports at the top of the module, I have to go look there to determine what xxx is. This is more of an issue when using
from m import xxx
. Seeingm.xxx
in the function probably tells me more. Depending on whatm
is: Is it a well-known top-level module/package (import m
)? Or is it a sub-module/package (from a.b.c import m
)? - In some cases having that extra information (“What is xxx?”) close to where xxx is used can make the function easier to understand.
As long as it’s import
and not from x import *
, you should put them at the top. It adds just one name to the global namespace, and you stick to PEP 8. Plus, if you later need it somewhere else, you don’t have to move anything around.
It’s no big deal, but since there’s almost no difference I’d suggest doing what PEP 8 says.
There are two occasions where I violate PEP 8 in this regard:
- Circular imports: module A imports module B, but something in module B needs module A (though this is often a sign that I need to refactor the modules to eliminate the circular dependency)
- Inserting a pdb breakpoint:
import pdb; pdb.set_trace()
This is handy b/c I don’t want to putimport pdb
at the top of every module I might want to debug, and it easy to remember to remove the import when I remove the breakpoint.
Outside of these two cases, it’s a good idea to put everything at the top. It makes the dependencies clearer.
One thing to bear in mind: needless imports can cause performance problems. So if this is a function that will be called frequently, you’re better off just putting the import at the top. Of course this is an optimization, so if there’s a valid case to be made that importing inside a function is more clear than importing at the top of a file, that trumps performance in most cases.
If you’re doing IronPython, I’m told that it’s better to import inside functions (since compiling code in IronPython can be slow). Thus, you may be able to get a way with importing inside functions then. But other than that, I’d argue that it’s just not worth it to fight convention.
As a general rule, I do this if there is an import that is only used within a single function.
Another point I’d like to make is that this may be a potential maintenence problem. What happens if you add a function that uses a module that was previously used by only one function? Are you going to remember to add the import to the top of the file? Or are you going to scan each and every function for imports?
FWIW, there are cases where it makes sense to import inside a function. For example, if you want to set the language in cx_Oracle, you need to set an NLS_
LANG environment variable before it is imported. Thus, you may see code like this:
import os
oracle = None
def InitializeOracle(lang):
global oracle
os.environ['NLS_LANG'] = lang
import cx_Oracle
oracle = cx_Oracle
I’ve broken this rule before for modules that are self-testing. That is, they are normally just used for support, but I define a main for them so that if you run them by themselves you can test their functionality. In that case I sometimes import getopt
and cmd
just in main, because I want it to be clear to someone reading the code that these modules have nothing to do with the normal operation of the module and are only being included for testing.
Here are the four import use cases that we use
-
import
(andfrom x import y
andimport x as y
) at the top -
Choices for Import. At the top.
import settings if setting.something: import this as foo else: import that as foo
-
Conditional Import. Used with JSON, XML libraries and the like. At the top.
try: import this as foo except ImportError: import that as foo
-
Dynamic Import. So far, we only have one example of this.
import settings module_stuff = {} module= __import__( settings.some_module, module_stuff ) x = module_stuff['x']
Note that this dynamic import doesn’t bring in code, but brings in complex
data structures written in Python. It’s kind of like a pickled piece of data
except we pickled it by hand.This is also, more-or-less, at the top of a module
Here’s what we do to make the code clearer:
-
Keep the modules short.
-
If I have all my imports at the top of the module, I have to go look there to determine what a name is. If the module is short, that’s easy to do.
-
In some cases having that extra information close to where a name is used can make the function easier to understand. If the module is short, that’s easy to do.
In the long run I think you’ll appreciate having most of your imports at the top of the file, that way you can tell at a glance how complicated your module is by what it needs to import.
If I’m adding new code to an existing file I’ll usually do the import where it’s needed and then if the code stays I’ll make things more permanent by moving the import line to the top of the file.
One other point, I prefer to get an ImportError
exception before any code is run — as a sanity check, so that’s another reason to import at the top.
I use pyChecker
to check for unused modules.
Coming from the question about loading the module twice – Why not both?
An import at the top of the script will indicate the dependencies and another import in the function with make this function more atomic, while seemingly not causing any performance disadvantage, since a consecutive import is cheap.
Have a look at the alternative approach that’s used in sqlalchemy: dependency injection:
@util.dependencies("sqlalchemy.orm.query")
def merge_result(query, *args):
#...
query.Query(...)
Notice how the imported library is declared in a decorator, and passed as an argument to the function!
This approach makes the code cleaner, and also works 4.5 times faster than an import
statement!
Benchmark: https://gist.github.com/kolypto/589e84fbcfb6312532658df2fabdb796
In modules that are both ‘normal’ modules and can be executed (i.e. have a if __name__ == '__main__':
-section), I usually import modules that are only used when executing the module inside the main section.
Example:
def really_useful_function(data):
...
def main():
from pathlib import Path
from argparse import ArgumentParser
from dataloader import load_data_from_directory
parser = ArgumentParser()
parser.add_argument('directory')
args = parser.parse_args()
data = load_data_from_directory(Path(args.directory))
print(really_useful_function(data)
if __name__ == '__main__':
main()
There’s another (probably "corner") case where it may be beneficial to import
inside rarely used functions: shorten startup time.
I hit that wall once with a rather complex program running on a small IoT server accepting commands from a serial line and performing operations, possibly very complex operations.
Placing import
statements at top of files meant to have all imports processed before server start; since import
list included jinja2
, lxml
, signxml
and other "heavy weights" (and SoC was not very powerful) this meant minutes before the first instruction was actually executed.
OTOH placing most imports in functions I was able to have the server "alive" on the serial line in seconds. Of course when the modules were actually needed I had to pay the price (Note: also this can be mitigated by spawning a background task doing import
s in idle time).