How would you decorate without modifying an inherited method?

Question:

I’ve seen and tried the answer given at How would one decorate an inherited method in the child class? to no avail.

Sample data:

import pandas as pd

df = pd.DataFrame([('Tom', 'M'), ('Sarah', 'X')], columns=['PersonName', 'PersonSex'])

I am using the pandera library for DataFrame data quality and validation. I have a class BaseClass which defines some functions I’ll be using in many of my schemas, e.g.:

class BaseClass:
    def check_valid_sex(col):
        return col.isin(['M', 'F'])

I’m inheriting this class in my schema classes where I will be applying the checks. Here, first, is an example of not using this class and writing the checks explicitly in the SchemaModel.

import pandera as pa
from pandera.typing import Series

class PersonClass(pa.SchemaModel):
    PersonName: Series[str]
    PersonSex: Series[str]

    @pa.check('PersonSex')
    def check_valid_sex(cls, col):
        return col.isin(['M', 'F'])

PersonClass.validate(df)

What I would like to do is re-use the check_valid_sex function in other SchemaModel classes where there is a PersonSex column, something like this:

import pandera as pa
from pandera.typing import Series

class PersonClass(pa.SchemaModel, BaseClass):
    PersonName: Series[str]
    PersonSex: Series[str]

    # apply the pa.check function to the BaseClass function to inherit the logic
    validate_sex = classmethod(pa.check(BaseClass.check_valid_sex, 'PersonSex'))

PersonClass.validate(df)

This should return a valid validation and is not working as I don’t believe the function is being registered correctly. How can I decorate the parent class without overwriting it? Simply decorate with the decorator args.

Asked By: TomNash

||

Answers:

The decorator here is actually a decorator constructor that then decorates what it’s called on. So you need to follow the same pattern, construct the check with the field name, then use it to manually wrap the base class function:

validate_sex = pa.check('PersonSex')(BaseClass.check_valid_sex)
             # ^^^^^^^^^^^^^^^^^^^^^ constructs decorator
             #                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^ calls on function 

It’s a little unclear why the base class (which is not a classmethod) accepts cls, while the child class attempted to wrap in classmethod. If only the child is supposed to be a classmethod, you could just do:

validate_sex = classmethod(pa.check('PersonSex')(BaseClass.check_valid_sex))

to wrap it, but if the base class method is actually a classmethod it gets uglier; you’ll have to unwrap it, e.g. you’d replace BaseClass.check_valid_sex with BaseClass.check_valid_sex.__func__ (binding the function to the class, then extracting just the unbound function), or in 3.10+, you could avoid the creation of the bound class method by doing BaseClass.__dict__['check_valid_sex'].__wrapped__ (which looks up the raw classmethod, bypassing the descriptor protocol, and extracts the function it wraps directly). You need to do this so you get the clean original function, not one indelibly bound to the parent class (where even when called on the child class, cls would still be BaseClass).

Answered By: ShadowRanger

There exists an interface to register custom check-functions with pandera’s check-engine, which among other things enables re-use:

import pandera as pa
from pandera.typing import Series
from pandera import extensions
import pandas as pd


@extensions.register_check_method()
def valid_sex(pandas_obj):
    return pandas_obj.isin(['M', 'F'])


class PersonClass(pa.SchemaModel):
    PersonName: Series[str]
    PersonSex: Series[str] = pa.Field(valid_sex=())


df = pd.DataFrame(
    [('Tom', 'M'), ('Sarah', 'X')],
    columns=['PersonName', 'PersonSex']
)

PersonClass.validate(df)

Prints

Traceback (most recent call last):
...
pandera.errors.SchemaError: <Schema Column(name=PersonSex, type=DataType(str))> failed element-wise validator 0:
<Check valid_sex>
failure cases:
   index failure_case
0      1            X
Answered By: Arne