How to store functions in a module that depend on other modules?

Question:

I have the following function stored in a separate module funcs.py and I want to read it into main.

def parse_date(date):
    # if value is null
    if pd.isnull(date) == True:
        # return 'UNDEFINED'
        return 'UNDEFINED'
    # else
    else:
        # for each format: 'mm/dd/yyyy', 'mm/dd/yy', 'mm-dd-yyy', 'mm-dd-yy', 'yyyy/mm/dd', 'yyyy-mm-dd', 'yyyymmdd'
        for fmt in ['%m/%d/%Y', '%m/%d/%y', '%m-%d-%Y', '%m-%d-%y', '%Y/%m/%d', '%Y-%m-%d', '%Y%m%d']:
            # try
            try:
                # return a date
                return datetime.strptime(date, fmt)
            # when error
            except:
                # move on to next date format
                pass
        raise ValueError('no valid date format found')

The function is dependent on pandas and datetime.

In my main.py I have separate code that also utilizes pandas and datetime. My import modules are at the top; pandas and datetime are included there again but I also import funcs.py.

import os
import pandas as pd
import re
import glob
import time
from datetime import datetime

from seqfuncs import *

What is the correct way to import funcs.py as well as pandas and datetime without repeating myself?

I’ve put the import statement for pandas and datetime at the top of funcs.py. I’ve put it inside the parse_date() function. Both these solutions work. But what is best practice? Should I still have the import statements at the top of main.py as well even through it is redundant?

From what I was able to find it sounds like the import statement should be in parse_date() within funcs.py if it isn’t used else where but pandas and datetime is used throughout main.py, hence my confusions.

Asked By: Cory

||

Answers:

Strictly speaking, the function is dependent on global variables named pd and datetime. Each module has its own global variables: no matter where you call parse_date, it will be looking for funcs.pd and funcs.datetime. The values of main.pd and main.datetime (defined by your import statements) are irrelevant to parse_date.

Just add import pd and from datetime import datetime to funcs.py. There is no significant cost to importing a module multiple times. The module itself is only defined once; the other import statements just bind the module to a name in the module’s global namespace.

Answered By: chepner

Import only where you use the variables. For example, if you define it in main.py and a reader is reading it in the funcs.py it is quite obvious that he would think that it is not imported. In fact, most of the IDE would also highlight the part with yellow stating imports are missed and needs to be defined. In general, you would organize a project as below:

README.md
LICENSE
requirements.txt
main.py
helpers/__init__.py
helpers/your_helper.py
docs/
tests/

your_helper.py would contain:

import pandas as pd
from datetime import datetime
def parse_date(date):
    # if value is null
    if pd.isnull(date) == True:
        # return 'UNDEFINED'
        return 'UNDEFINED'
...

Assuming, there are some helper functions defined in the helper module then in main.py it would be:

from helpers.your_helper import parse_date

if __name__ == "__main__":
    parsed_date = parse_date()

A good reference is here The Hitchhiker’s Guide to Python!

Answered By: coldy
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.