Create my own method for DataFrames (python)

Question:

So I wanted to create a module for my own projects and wanted to use methods. For example I wanted to do:

from mymodule import *
df = pd.DataFrame(np.random.randn(4,4))
df.mymethod()

Thing is it seems I can’t use .myfunc() since I think I can only use methods for the classes I’ve created. A work around is making mymethod a function and making it use pandas.Dataframes as a variable:

myfunc(df)

I don’t really want to do this, is there anyway to implement the first one?

Asked By: Shiranai

||

Answers:

If you really need to add a method to a pandas.DataFrame you can inherit from it. Something like:

mymodule:

import pandas as pd

class MyDataFrame(pd.DataFrame):
    def mymethod(self):
        """Do my stuff"""

Use mymodule:

from mymodule import *
df = MyDataFrame(np.random.randn(4,4))
df.mymethod()

To preserve your custom dataframe class:

pandas routinely returns new dataframes when performing operations on dataframes. So to preserve your dataframe class, you need to have pandas return your class when performing operations on an instance of your class. That can be done by providing a _constructor property like:

class MyDataFrame(pd.DataFrame):

    @property
    def _constructor(self):
        return MyDataFrame

    def mymethod(self):
        """Do my stuff"""

Test Code:

class MyDataFrame(pd.DataFrame):

    @property
    def _constructor(self):
        return MyDataFrame

df = MyDataFrame([1])
print(type(df))
df = df.rename(columns={})
print(type(df))

Test Results:

<class '__main__.MyDataFrame'>
<class '__main__.MyDataFrame'>
Answered By: Stephen Rauch

Nice solution can be found in ffn package. What authors do:

from pandas.core.base import PandasObject
def your_fun(df):
    ...
PandasObject.your_fun = your_fun

After that your manual function “your_fun” becomes a method of pandas.DataFrame object and you can do something like

df.your_fun()

This method will be able to work with both DataFrame and Series objects

Answered By: Ivan Mishalkin

This topic is well documented as of Nov 2019: Extending pandas

Note that the most obvious technique – Ivan Mishalkin’s monkey patching – was actually removed at some point in the official documentation… probably for good reason.

Monkey patching works fine for small projects, but there is a serious drawback for a large scale project: IDEs like Pycharm can’t introspect the patched-in methods. So if one right clicks “Go to declaration”, Pycharm simply says “cannot find declaration to go to”. It gets old fast if you’re an IDE junkie.

I confirmed that Pycharm CAN introspect both the “custom accessors” and “subclassing” methods discussed in the official documentation.

Answered By: pandichef

I have used the Ivan Mishalkins handy solution in our in-house python library extensively. At some point I thought, it would be better to use his solution in form of a decorator. The only restriction is that the first argument of decorated function must be a DataFrame:

from copy import deepcopy
from functools import wraps
import pandas as pd
from pandas.core.base import PandasObject

def as_method(func):
    """
    This decrator makes a function also available as a method.
    The first passed argument must be a DataFrame.
    """

    @wraps(func)
    def wrapper(*args, **kwargs):
        return func(*deepcopy(args), **deepcopy(kwargs))

    setattr(PandasObject, wrapper.__name__, wrapper)

    return wrapper


@as_method
def augment_x(DF, x):
    """We will be able to see this docstring if we run ??augment_x"""
    DF[f"column_{x}"] = x

    return DF

Example:

df = pd.DataFrame({"A": [1, 2]})
df
   A
0  1
1  2

df.augment_x(10)
   A  column_10
0  1         10
1  2         10

As you can see, the original DataFrame is not changed. As if there is a inplace = False

df
   A
0  1
1  2

You can still use the augment_x as a simple function:

augment_x(df, 2)
    A   column_2
0   1   2
1   2   2
Answered By: Amir Py
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.