pandas extensions usage without importing it

Question:

I have created pandas extensions as mentioned here.

The extending classes are defined in a module named pd_extensions, and I would like to use them in a different module my_module for example. The two modules are in the same package called source.

currently to be able to use the extensions Im importing the pd_extensions module into my_module like this:

import source.pd_extensions

is there a way to use the extensions I created without importing the module?

I find myself importing this module to every module that I want to use the extensions in the package, and I thought there might be a better way of doing it (maybe through the _ _ init _ _ module).

I tried just using the extensions without importing the module they are defined in, but it did not work obviously.

I’m thinking about importing it in the  _ _ init _ _ file so all the modules in the package would have access to it without having to import it themself, but I can’t figure out if it’s possible.

Asked By: Tomer Roditi

||

Answers:

Yes, it is possible to add it to pandas’s __init__.py and just import pandas but I would first create a virtual environment before you do that.

Here is how you can go about it using conda

conda create -n test_env python=3.10
conda install pandas

navigate to the pandas folder in test_env

/Users/.../opt/anaconda3/envs/test_env/lib/python3.10/site-packages/pandas/__init__.py

at the bottom of the __init__.py I added

import pandas as pd


@pd.api.extensions.register_dataframe_accessor("geo")
class GeoAccessor:
    def __init__(self, pandas_obj):
        self._obj = pandas_obj

    @property
    def center(self):
        # return the geographic center point of this DataFrame
        lat = self._obj.latitude
        lon = self._obj.longitude
        return (float(lon.mean()), float(lat.mean()))

    def plot(self):
        # plot this array's data on a map, e.g., using Cartopy
        pass

Now you should just be able to do

(base) ~ % conda activate test_env          
(test_env) ~ % python
Python 3.10.9 (main, Jan 11 2023, 09:18:20) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> ds = pd.DataFrame({"longitude": np.linspace(0, 10),
...                    "latitude": np.linspace(0, 20)})
>>> ds.geo.center
(5.0, 10.0)
>>> 
Answered By: It_is_Chris

I think you can import the extension module in __init__ file since the extension module will first import pandas and then register the accessor therefore the pandas module will be cached in sys.modules and any subsequent import to pandas from other modules will simply retrieve the entry from the cache.

here is the simple example:

source
├── __init__.py
├── my_module.py
└── pd_extension.py

The following are the contente of the files:

# pd_extension.py
import pandas as pd

@pd.api.extensions.register_dataframe_accessor('spam')
class Spam:
    def __init__(self, df):
        self.df = df

    @property
    def shape(self):
        return self.df.shape  

# my_module.py
import pandas as pd

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])
print(df.spam.shape)

# __init__.py
import source.pd_extension

Now lets test the code by executing my_module.py which works as expected

$ python -m source.my_module
(2, 3)
Answered By: Shubham Sharma
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.