pandas extensions usage without importing it
Question:
I have created pandas extensions as mentioned here.
The extending classes are defined in a module named pd_extensions, and I would like to use them in a different module my_module for example. The two modules are in the same package called source.
currently to be able to use the extensions Im importing the pd_extensions module into my_module like this:
import source.pd_extensions
is there a way to use the extensions I created without importing the module?
I find myself importing this module to every module that I want to use the extensions in the package, and I thought there might be a better way of doing it (maybe through the _ _ init _ _ module).
I tried just using the extensions without importing the module they are defined in, but it did not work obviously.
I’m thinking about importing it in the _ _ init _ _ file so all the modules in the package would have access to it without having to import it themself, but I can’t figure out if it’s possible.
Answers:
Yes, it is possible to add it to pandas’s __init__.py
and just import pandas but I would first create a virtual environment before you do that.
Here is how you can go about it using conda
conda create -n test_env python=3.10
conda install pandas
navigate to the pandas folder in test_env
/Users/.../opt/anaconda3/envs/test_env/lib/python3.10/site-packages/pandas/__init__.py
at the bottom of the __init__.py
I added
import pandas as pd
@pd.api.extensions.register_dataframe_accessor("geo")
class GeoAccessor:
def __init__(self, pandas_obj):
self._obj = pandas_obj
@property
def center(self):
# return the geographic center point of this DataFrame
lat = self._obj.latitude
lon = self._obj.longitude
return (float(lon.mean()), float(lat.mean()))
def plot(self):
# plot this array's data on a map, e.g., using Cartopy
pass
Now you should just be able to do
(base) ~ % conda activate test_env
(test_env) ~ % python
Python 3.10.9 (main, Jan 11 2023, 09:18:20) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> ds = pd.DataFrame({"longitude": np.linspace(0, 10),
... "latitude": np.linspace(0, 20)})
>>> ds.geo.center
(5.0, 10.0)
>>>
I think you can import the extension module in __init__
file since the extension module will first import pandas and then register the accessor therefore the pandas module will be cached in sys.modules
and any subsequent import to pandas from other modules will simply retrieve the entry from the cache.
here is the simple example:
source
├── __init__.py
├── my_module.py
└── pd_extension.py
The following are the contente of the files:
# pd_extension.py
import pandas as pd
@pd.api.extensions.register_dataframe_accessor('spam')
class Spam:
def __init__(self, df):
self.df = df
@property
def shape(self):
return self.df.shape
# my_module.py
import pandas as pd
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])
print(df.spam.shape)
# __init__.py
import source.pd_extension
Now lets test the code by executing my_module.py
which works as expected
$ python -m source.my_module
(2, 3)
I have created pandas extensions as mentioned here.
The extending classes are defined in a module named pd_extensions, and I would like to use them in a different module my_module for example. The two modules are in the same package called source.
currently to be able to use the extensions Im importing the pd_extensions module into my_module like this:
import source.pd_extensions
is there a way to use the extensions I created without importing the module?
I find myself importing this module to every module that I want to use the extensions in the package, and I thought there might be a better way of doing it (maybe through the _ _ init _ _ module).
I tried just using the extensions without importing the module they are defined in, but it did not work obviously.
I’m thinking about importing it in the _ _ init _ _ file so all the modules in the package would have access to it without having to import it themself, but I can’t figure out if it’s possible.
Yes, it is possible to add it to pandas’s __init__.py
and just import pandas but I would first create a virtual environment before you do that.
Here is how you can go about it using conda
conda create -n test_env python=3.10
conda install pandas
navigate to the pandas folder in test_env
/Users/.../opt/anaconda3/envs/test_env/lib/python3.10/site-packages/pandas/__init__.py
at the bottom of the __init__.py
I added
import pandas as pd
@pd.api.extensions.register_dataframe_accessor("geo")
class GeoAccessor:
def __init__(self, pandas_obj):
self._obj = pandas_obj
@property
def center(self):
# return the geographic center point of this DataFrame
lat = self._obj.latitude
lon = self._obj.longitude
return (float(lon.mean()), float(lat.mean()))
def plot(self):
# plot this array's data on a map, e.g., using Cartopy
pass
Now you should just be able to do
(base) ~ % conda activate test_env
(test_env) ~ % python
Python 3.10.9 (main, Jan 11 2023, 09:18:20) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> ds = pd.DataFrame({"longitude": np.linspace(0, 10),
... "latitude": np.linspace(0, 20)})
>>> ds.geo.center
(5.0, 10.0)
>>>
I think you can import the extension module in __init__
file since the extension module will first import pandas and then register the accessor therefore the pandas module will be cached in sys.modules
and any subsequent import to pandas from other modules will simply retrieve the entry from the cache.
here is the simple example:
source
├── __init__.py
├── my_module.py
└── pd_extension.py
The following are the contente of the files:
# pd_extension.py
import pandas as pd
@pd.api.extensions.register_dataframe_accessor('spam')
class Spam:
def __init__(self, df):
self.df = df
@property
def shape(self):
return self.df.shape
# my_module.py
import pandas as pd
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])
print(df.spam.shape)
# __init__.py
import source.pd_extension
Now lets test the code by executing my_module.py
which works as expected
$ python -m source.my_module
(2, 3)