How to store functions in a module that depend on other modules?
Question:
I have the following function stored in a separate module funcs.py and I want to read it into main.
def parse_date(date):
# if value is null
if pd.isnull(date) == True:
# return 'UNDEFINED'
return 'UNDEFINED'
# else
else:
# for each format: 'mm/dd/yyyy', 'mm/dd/yy', 'mm-dd-yyy', 'mm-dd-yy', 'yyyy/mm/dd', 'yyyy-mm-dd', 'yyyymmdd'
for fmt in ['%m/%d/%Y', '%m/%d/%y', '%m-%d-%Y', '%m-%d-%y', '%Y/%m/%d', '%Y-%m-%d', '%Y%m%d']:
# try
try:
# return a date
return datetime.strptime(date, fmt)
# when error
except:
# move on to next date format
pass
raise ValueError('no valid date format found')
The function is dependent on pandas and datetime.
In my main.py I have separate code that also utilizes pandas and datetime. My import modules are at the top; pandas and datetime are included there again but I also import funcs.py.
import os
import pandas as pd
import re
import glob
import time
from datetime import datetime
from seqfuncs import *
What is the correct way to import funcs.py as well as pandas and datetime without repeating myself?
I’ve put the import statement for pandas and datetime at the top of funcs.py. I’ve put it inside the parse_date() function. Both these solutions work. But what is best practice? Should I still have the import statements at the top of main.py as well even through it is redundant?
From what I was able to find it sounds like the import statement should be in parse_date() within funcs.py if it isn’t used else where but pandas and datetime is used throughout main.py, hence my confusions.
Answers:
Strictly speaking, the function is dependent on global variables named pd and datetime. Each module has its own global variables: no matter where you call parse_date, it will be looking for funcs.pd and funcs.datetime. The values of main.pd and main.datetime (defined by your import statements) are irrelevant to parse_date.
Just add import pd
and from datetime import datetime
to funcs.py
. There is no significant cost to importing a module multiple times. The module itself is only defined once; the other import
statements just bind the module to a name in the module’s global namespace.
Import only where you use the variables. For example, if you define it in main.py and a reader is reading it in the funcs.py it is quite obvious that he would think that it is not imported. In fact, most of the IDE would also highlight the part with yellow stating imports are missed and needs to be defined. In general, you would organize a project as below:
README.md
LICENSE
requirements.txt
main.py
helpers/__init__.py
helpers/your_helper.py
docs/
tests/
your_helper.py
would contain:
import pandas as pd
from datetime import datetime
def parse_date(date):
# if value is null
if pd.isnull(date) == True:
# return 'UNDEFINED'
return 'UNDEFINED'
...
Assuming, there are some helper functions defined in the helper module then in main.py
it would be:
from helpers.your_helper import parse_date
if __name__ == "__main__":
parsed_date = parse_date()
A good reference is here The Hitchhiker’s Guide to Python!
I have the following function stored in a separate module funcs.py and I want to read it into main.
def parse_date(date):
# if value is null
if pd.isnull(date) == True:
# return 'UNDEFINED'
return 'UNDEFINED'
# else
else:
# for each format: 'mm/dd/yyyy', 'mm/dd/yy', 'mm-dd-yyy', 'mm-dd-yy', 'yyyy/mm/dd', 'yyyy-mm-dd', 'yyyymmdd'
for fmt in ['%m/%d/%Y', '%m/%d/%y', '%m-%d-%Y', '%m-%d-%y', '%Y/%m/%d', '%Y-%m-%d', '%Y%m%d']:
# try
try:
# return a date
return datetime.strptime(date, fmt)
# when error
except:
# move on to next date format
pass
raise ValueError('no valid date format found')
The function is dependent on pandas and datetime.
In my main.py I have separate code that also utilizes pandas and datetime. My import modules are at the top; pandas and datetime are included there again but I also import funcs.py.
import os
import pandas as pd
import re
import glob
import time
from datetime import datetime
from seqfuncs import *
What is the correct way to import funcs.py as well as pandas and datetime without repeating myself?
I’ve put the import statement for pandas and datetime at the top of funcs.py. I’ve put it inside the parse_date() function. Both these solutions work. But what is best practice? Should I still have the import statements at the top of main.py as well even through it is redundant?
From what I was able to find it sounds like the import statement should be in parse_date() within funcs.py if it isn’t used else where but pandas and datetime is used throughout main.py, hence my confusions.
Strictly speaking, the function is dependent on global variables named pd and datetime. Each module has its own global variables: no matter where you call parse_date, it will be looking for funcs.pd and funcs.datetime. The values of main.pd and main.datetime (defined by your import statements) are irrelevant to parse_date.
Just add import pd
and from datetime import datetime
to funcs.py
. There is no significant cost to importing a module multiple times. The module itself is only defined once; the other import
statements just bind the module to a name in the module’s global namespace.
Import only where you use the variables. For example, if you define it in main.py and a reader is reading it in the funcs.py it is quite obvious that he would think that it is not imported. In fact, most of the IDE would also highlight the part with yellow stating imports are missed and needs to be defined. In general, you would organize a project as below:
README.md
LICENSE
requirements.txt
main.py
helpers/__init__.py
helpers/your_helper.py
docs/
tests/
your_helper.py
would contain:
import pandas as pd
from datetime import datetime
def parse_date(date):
# if value is null
if pd.isnull(date) == True:
# return 'UNDEFINED'
return 'UNDEFINED'
...
Assuming, there are some helper functions defined in the helper module then in main.py
it would be:
from helpers.your_helper import parse_date
if __name__ == "__main__":
parsed_date = parse_date()
A good reference is here The Hitchhiker’s Guide to Python!