How to refer to relative paths of resources when working with a code repository
Question:
We are working with a code repository which is deployed to both Windows and Linux – sometimes in different directories. How should one of the modules inside the project refer to one of the non-Python resources in the project (CSV files, etc.)?
If we do something like:
thefile = open('test.csv')
or:
thefile = open('../somedirectory/test.csv')
It will work only when the script is run from one specific directory, or a subset of the directories.
What I would like to do is something like:
path = getBasePathOfProject() + '/somedirectory/test.csv'
thefile = open(path)
Is it possible?
Answers:
Try to use a filename relative to the current files path. Example for ‘./my_file’:
fn = os.path.join(os.path.dirname(__file__), 'my_file')
In Python 3.4+ you can also use pathlib:
fn = pathlib.Path(__file__).parent / 'my_file'
You can use the build in __file__
variable. It contains the path of the current file. I would implement getBaseOfProject in a module in the root of your project. There I would get the path part of __file__
and would return that. This method can then be used everywhere in your project.
import os
cwd = os.getcwd()
path = os.path.join(cwd, "my_file")
f = open(path)
You also try to normalize your cwd
using os.path.abspath(os.getcwd())
. More info here.
I often use something similar to this:
import os
DATA_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), 'datadir'))
# if you have more paths to set, you might want to shorten this as
here = lambda x: os.path.abspath(os.path.join(os.path.dirname(__file__), x))
DATA_DIR = here('datadir')
pathjoin = os.path.join
# ...
# later in script
for fn in os.listdir(DATA_DIR):
f = open(pathjoin(DATA_DIR, fn))
# ...
The variable
__file__
holds the file name of the script you write that code in, so you can make paths relative to script, but still written with absolute paths. It works quite well for several reasons:
- path is absolute, but still relative
- the project can still be deployed in a relative container
But you need to watch for platform compatibility – Windows’ os.pathsep is different than UNIX.
I spent a long time figuring out the answer to this, but I finally got it (and it’s actually really simple):
import sys
import os
sys.path.append(os.getcwd() + '/your/subfolder/of/choice')
# now import whatever other modules you want, both the standard ones,
# as the ones supplied in your subfolders
This will append the relative path of your subfolder to the directories for python to look in
It’s pretty quick and dirty, but it works like a charm 🙂
If you are using setup tools or distribute (a setup.py install) then the “right” way to access these packaged resources seem to be using package_resources.
In your case the example would be
import pkg_resources
my_data = pkg_resources.resource_string(__name__, "foo.dat")
Which of course reads the resource and the read binary data would be the value of my_data
If you just need the filename you could also use
resource_filename(package_or_requirement, resource_name)
Example:
resource_filename("MyPackage","foo.dat")
The advantage is that its guaranteed to work even if it is an archive distribution like an egg.
See http://packages.python.org/distribute/pkg_resources.html#resourcemanager-api
In Python, paths are relative to the current working directory, which in most cases is the directory from which you run your program. The current working directory is very likely not as same as the directory of your module file, so using a path relative to your current module file is always a bad choice.
Using absolute path should be the best solution:
import os
package_dir = os.path.dirname(os.path.abspath(__file__))
thefile = os.path.join(package_dir,'test.cvs')
I got stumped here a bit. Wanted to package some resource files into a wheel file and access them. Did the packaging using manifest file, but pip install was not installing it unless it was a sub directory. Hoping these sceen shots will help
├── cnn_client
│ ├── image_preprocessor.py
│ ├── __init__.py
│ ├── resources
│ │ ├── mscoco_complete_label_map.pbtxt
│ │ ├── retinanet_complete_label_map.pbtxt
│ │ └── retinanet_label_map.py
│ ├── tf_client.py
MANIFEST.in
recursive-include cnn_client/resources *
Created a weel using standard setup.py . pip installed the wheel file.
After installation checked if resources are installed. They are
ls /usr/local/lib/python2.7/dist-packages/cnn_client/resources
mscoco_complete_label_map.pbtxt
retinanet_complete_label_map.pbtxt
retinanet_label_map.py
In tfclient.py to access these files. from
templates_dir = os.path.join(os.path.dirname(__file__), 'resources')
file_path = os.path.join(templates_dir,
'mscoco_complete_label_map.pbtxt')
s = open(file_path, 'r').read()
And it works.
Since you say you have some code that you deploy to various places, you should use the python ecosystem to distribute resources, which is not limited to files only. It also supports accessing files inside zip archives, which can be nice so that you don’t have to bother with that.
Previously, this was handeled with pkg_resources
from setuptools
, but with more and more tools popping up, the ecosystem has shifted. Since python 3.7, you should use importlib.resources
import importlib.resources
with importlib.resources.open_text('mypackage.somedirectory','text.csv') as f:
print(f.read()) # or whatever
But you must also instruct your installer to include package resources. Otherwise, a pip install mypackage
would not bundle the data files.
There are many ways to do that, but one way to do it is to add
[options.package_data]
mypackage =
"somedirectory/*.csv"
into your setup.cfg
. There are equivalent approaches for when using setup.py
or pyproject.toml
. A more complete account is available on setuptools homepage
We are working with a code repository which is deployed to both Windows and Linux – sometimes in different directories. How should one of the modules inside the project refer to one of the non-Python resources in the project (CSV files, etc.)?
If we do something like:
thefile = open('test.csv')
or:
thefile = open('../somedirectory/test.csv')
It will work only when the script is run from one specific directory, or a subset of the directories.
What I would like to do is something like:
path = getBasePathOfProject() + '/somedirectory/test.csv'
thefile = open(path)
Is it possible?
Try to use a filename relative to the current files path. Example for ‘./my_file’:
fn = os.path.join(os.path.dirname(__file__), 'my_file')
In Python 3.4+ you can also use pathlib:
fn = pathlib.Path(__file__).parent / 'my_file'
You can use the build in __file__
variable. It contains the path of the current file. I would implement getBaseOfProject in a module in the root of your project. There I would get the path part of __file__
and would return that. This method can then be used everywhere in your project.
import os
cwd = os.getcwd()
path = os.path.join(cwd, "my_file")
f = open(path)
You also try to normalize your cwd
using os.path.abspath(os.getcwd())
. More info here.
I often use something similar to this:
import os
DATA_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), 'datadir'))
# if you have more paths to set, you might want to shorten this as
here = lambda x: os.path.abspath(os.path.join(os.path.dirname(__file__), x))
DATA_DIR = here('datadir')
pathjoin = os.path.join
# ...
# later in script
for fn in os.listdir(DATA_DIR):
f = open(pathjoin(DATA_DIR, fn))
# ...
The variable
__file__
holds the file name of the script you write that code in, so you can make paths relative to script, but still written with absolute paths. It works quite well for several reasons:
- path is absolute, but still relative
- the project can still be deployed in a relative container
But you need to watch for platform compatibility – Windows’ os.pathsep is different than UNIX.
I spent a long time figuring out the answer to this, but I finally got it (and it’s actually really simple):
import sys
import os
sys.path.append(os.getcwd() + '/your/subfolder/of/choice')
# now import whatever other modules you want, both the standard ones,
# as the ones supplied in your subfolders
This will append the relative path of your subfolder to the directories for python to look in
It’s pretty quick and dirty, but it works like a charm 🙂
If you are using setup tools or distribute (a setup.py install) then the “right” way to access these packaged resources seem to be using package_resources.
In your case the example would be
import pkg_resources
my_data = pkg_resources.resource_string(__name__, "foo.dat")
Which of course reads the resource and the read binary data would be the value of my_data
If you just need the filename you could also use
resource_filename(package_or_requirement, resource_name)
Example:
resource_filename("MyPackage","foo.dat")
The advantage is that its guaranteed to work even if it is an archive distribution like an egg.
See http://packages.python.org/distribute/pkg_resources.html#resourcemanager-api
In Python, paths are relative to the current working directory, which in most cases is the directory from which you run your program. The current working directory is very likely not as same as the directory of your module file, so using a path relative to your current module file is always a bad choice.
Using absolute path should be the best solution:
import os
package_dir = os.path.dirname(os.path.abspath(__file__))
thefile = os.path.join(package_dir,'test.cvs')
I got stumped here a bit. Wanted to package some resource files into a wheel file and access them. Did the packaging using manifest file, but pip install was not installing it unless it was a sub directory. Hoping these sceen shots will help
├── cnn_client
│ ├── image_preprocessor.py
│ ├── __init__.py
│ ├── resources
│ │ ├── mscoco_complete_label_map.pbtxt
│ │ ├── retinanet_complete_label_map.pbtxt
│ │ └── retinanet_label_map.py
│ ├── tf_client.py
MANIFEST.in
recursive-include cnn_client/resources *
Created a weel using standard setup.py . pip installed the wheel file.
After installation checked if resources are installed. They are
ls /usr/local/lib/python2.7/dist-packages/cnn_client/resources
mscoco_complete_label_map.pbtxt
retinanet_complete_label_map.pbtxt
retinanet_label_map.py
In tfclient.py to access these files. from
templates_dir = os.path.join(os.path.dirname(__file__), 'resources')
file_path = os.path.join(templates_dir,
'mscoco_complete_label_map.pbtxt')
s = open(file_path, 'r').read()
And it works.
Since you say you have some code that you deploy to various places, you should use the python ecosystem to distribute resources, which is not limited to files only. It also supports accessing files inside zip archives, which can be nice so that you don’t have to bother with that.
Previously, this was handeled with pkg_resources
from setuptools
, but with more and more tools popping up, the ecosystem has shifted. Since python 3.7, you should use importlib.resources
import importlib.resources
with importlib.resources.open_text('mypackage.somedirectory','text.csv') as f:
print(f.read()) # or whatever
But you must also instruct your installer to include package resources. Otherwise, a pip install mypackage
would not bundle the data files.
There are many ways to do that, but one way to do it is to add
[options.package_data]
mypackage =
"somedirectory/*.csv"
into your setup.cfg
. There are equivalent approaches for when using setup.py
or pyproject.toml
. A more complete account is available on setuptools homepage