Create my own method for DataFrames (python)
Question:
So I wanted to create a module for my own projects and wanted to use methods. For example I wanted to do:
from mymodule import *
df = pd.DataFrame(np.random.randn(4,4))
df.mymethod()
Thing is it seems I can’t use .myfunc()
since I think I can only use methods for the classes I’ve created. A work around is making mymethod
a function and making it use pandas.Dataframes
as a variable:
myfunc(df)
I don’t really want to do this, is there anyway to implement the first one?
Answers:
If you really need to add a method to a pandas.DataFrame
you can inherit from it. Something like:
mymodule:
import pandas as pd
class MyDataFrame(pd.DataFrame):
def mymethod(self):
"""Do my stuff"""
Use mymodule:
from mymodule import *
df = MyDataFrame(np.random.randn(4,4))
df.mymethod()
To preserve your custom dataframe class:
pandas
routinely returns new dataframes when performing operations on dataframes. So to preserve your dataframe class, you need to have pandas
return your class when performing operations on an instance of your class. That can be done by providing a _constructor
property like:
class MyDataFrame(pd.DataFrame):
@property
def _constructor(self):
return MyDataFrame
def mymethod(self):
"""Do my stuff"""
Test Code:
class MyDataFrame(pd.DataFrame):
@property
def _constructor(self):
return MyDataFrame
df = MyDataFrame([1])
print(type(df))
df = df.rename(columns={})
print(type(df))
Test Results:
<class '__main__.MyDataFrame'>
<class '__main__.MyDataFrame'>
Nice solution can be found in ffn package. What authors do:
from pandas.core.base import PandasObject
def your_fun(df):
...
PandasObject.your_fun = your_fun
After that your manual function “your_fun” becomes a method of pandas.DataFrame object and you can do something like
df.your_fun()
This method will be able to work with both DataFrame and Series objects
This topic is well documented as of Nov 2019: Extending pandas
Note that the most obvious technique – Ivan Mishalkin’s monkey patching – was actually removed at some point in the official documentation… probably for good reason.
Monkey patching works fine for small projects, but there is a serious drawback for a large scale project: IDEs like Pycharm can’t introspect the patched-in methods. So if one right clicks “Go to declaration”, Pycharm simply says “cannot find declaration to go to”. It gets old fast if you’re an IDE junkie.
I confirmed that Pycharm CAN introspect both the “custom accessors” and “subclassing” methods discussed in the official documentation.
I have used the Ivan Mishalkins handy solution in our in-house python library extensively. At some point I thought, it would be better to use his solution in form of a decorator. The only restriction is that the first argument of decorated function must be a DataFrame:
from copy import deepcopy
from functools import wraps
import pandas as pd
from pandas.core.base import PandasObject
def as_method(func):
"""
This decrator makes a function also available as a method.
The first passed argument must be a DataFrame.
"""
@wraps(func)
def wrapper(*args, **kwargs):
return func(*deepcopy(args), **deepcopy(kwargs))
setattr(PandasObject, wrapper.__name__, wrapper)
return wrapper
@as_method
def augment_x(DF, x):
"""We will be able to see this docstring if we run ??augment_x"""
DF[f"column_{x}"] = x
return DF
Example:
df = pd.DataFrame({"A": [1, 2]})
df
A
0 1
1 2
df.augment_x(10)
A column_10
0 1 10
1 2 10
As you can see, the original DataFrame is not changed. As if there is a inplace = False
df
A
0 1
1 2
You can still use the augment_x
as a simple function:
augment_x(df, 2)
A column_2
0 1 2
1 2 2
So I wanted to create a module for my own projects and wanted to use methods. For example I wanted to do:
from mymodule import *
df = pd.DataFrame(np.random.randn(4,4))
df.mymethod()
Thing is it seems I can’t use .myfunc()
since I think I can only use methods for the classes I’ve created. A work around is making mymethod
a function and making it use pandas.Dataframes
as a variable:
myfunc(df)
I don’t really want to do this, is there anyway to implement the first one?
If you really need to add a method to a pandas.DataFrame
you can inherit from it. Something like:
mymodule:
import pandas as pd
class MyDataFrame(pd.DataFrame):
def mymethod(self):
"""Do my stuff"""
Use mymodule:
from mymodule import *
df = MyDataFrame(np.random.randn(4,4))
df.mymethod()
To preserve your custom dataframe class:
pandas
routinely returns new dataframes when performing operations on dataframes. So to preserve your dataframe class, you need to have pandas
return your class when performing operations on an instance of your class. That can be done by providing a _constructor
property like:
class MyDataFrame(pd.DataFrame):
@property
def _constructor(self):
return MyDataFrame
def mymethod(self):
"""Do my stuff"""
Test Code:
class MyDataFrame(pd.DataFrame):
@property
def _constructor(self):
return MyDataFrame
df = MyDataFrame([1])
print(type(df))
df = df.rename(columns={})
print(type(df))
Test Results:
<class '__main__.MyDataFrame'>
<class '__main__.MyDataFrame'>
Nice solution can be found in ffn package. What authors do:
from pandas.core.base import PandasObject
def your_fun(df):
...
PandasObject.your_fun = your_fun
After that your manual function “your_fun” becomes a method of pandas.DataFrame object and you can do something like
df.your_fun()
This method will be able to work with both DataFrame and Series objects
This topic is well documented as of Nov 2019: Extending pandas
Note that the most obvious technique – Ivan Mishalkin’s monkey patching – was actually removed at some point in the official documentation… probably for good reason.
Monkey patching works fine for small projects, but there is a serious drawback for a large scale project: IDEs like Pycharm can’t introspect the patched-in methods. So if one right clicks “Go to declaration”, Pycharm simply says “cannot find declaration to go to”. It gets old fast if you’re an IDE junkie.
I confirmed that Pycharm CAN introspect both the “custom accessors” and “subclassing” methods discussed in the official documentation.
I have used the Ivan Mishalkins handy solution in our in-house python library extensively. At some point I thought, it would be better to use his solution in form of a decorator. The only restriction is that the first argument of decorated function must be a DataFrame:
from copy import deepcopy
from functools import wraps
import pandas as pd
from pandas.core.base import PandasObject
def as_method(func):
"""
This decrator makes a function also available as a method.
The first passed argument must be a DataFrame.
"""
@wraps(func)
def wrapper(*args, **kwargs):
return func(*deepcopy(args), **deepcopy(kwargs))
setattr(PandasObject, wrapper.__name__, wrapper)
return wrapper
@as_method
def augment_x(DF, x):
"""We will be able to see this docstring if we run ??augment_x"""
DF[f"column_{x}"] = x
return DF
Example:
df = pd.DataFrame({"A": [1, 2]})
df
A
0 1
1 2
df.augment_x(10)
A column_10
0 1 10
1 2 10
As you can see, the original DataFrame is not changed. As if there is a inplace = False
df
A
0 1
1 2
You can still use the augment_x
as a simple function:
augment_x(df, 2)
A column_2
0 1 2
1 2 2