Inherit as required only some fields from parent pandera SchemaModel

Question:

I have Input and Output pandera SchemaModels and the Output inherits the Input which accurately represents that all attributes of the Input schema are in the scope of the Output schema.

What I want to avoid is inheriting all attributes as required (non-Optional) as they are rightly coming from the Input schema. Instead I want to preserve them as required for the Input schema but define which of them remain required for the Output schema while the other inherited attributes become optional.

This pydantic question is similar and has solution for defining __init_subclass__ method in the parent class. However, this doesn’t work out of the box for pandera classes and I’m not sure if it is even implementable or the right approach.

import pandera as pa
from typing import Optional
from pandera.typing import Index, DataFrame, Series, Category

class InputSchema(pa.SchemaModel):

    reporting_date: Series[pa.DateTime] = pa.Field(coerce=True)

    def __init_subclass__(cls, optional_fields=None, **kwargs):
        super().__init_subclass__(**kwargs)
        if optional_fields:
            for field in optional_fields:
                cls.__fields__[field].outer_type_ = Optional
                cls.__fields__[field].required = False    


class OutputSchema(InputSchema, optional_fields=['reporting_date']):

    test: Series[str] = pa.Field()


@pa.check_types
def func(inputs: DataFrame[InputSchema]) -> DataFrame[OutputSchema]:
    inputs = inputs.drop(columns=['reporting_date'])
    inputs['test'] = 'a'
    return inputs

data = pd.DataFrame({'reporting_date': ['2023-01-11', '2023-01-12']})

func(data)

Error:

---> 18 class OutputSchema(InputSchema, optional_fields=['reporting_date']):
KeyError: 'reporting_date'

Edit:

Desired outcome to be able to set which fields from the inherited schema remain required while the remaining become optional:

class InputSchema(pa.SchemaModel):

    reporting_date: Series[pa.DateTime] = pa.Field(coerce=True)
    other_field: Series[str] = pa.Field()


class OutputSchema(InputSchema, required=['reporting_date'])

    test: Series[str] = pa.Field()

The resulting OutputSchema has reporting_date and test as required while other_field as optional.

Asked By: Konstantin

||

Answers:

A similar question was asked on pandera’s issue tracker, with a docs update on track for the next pandera release. There is no clean solution, but the most simple one is to exclude columns by overloading to_schema:

import pandera as pa
from pandera.typing import Series

class InputSchema(pa.SchemaModel):
    reporting_date: Series[pa.DateTime] = pa.Field(coerce=True)

class OutputSchema(InputSchema):
    test: Series[str]
    
    @classmethod
    def to_schema(cls) -> pa.DataFrameSchema:
        return super().to_schema().remove_columns(["reporting_date"])

This runs without SchemaError against your check function.

Answered By: Arne

Here is a solution by reusing existing type annotation from the input schema:

import pandera as pa
import pandas as pd
from typing import Optional
from pandera.typing import Index, DataFrame, Series, Category
from pydantic import Field, BaseModel
from typing import Annotated, Type


def copy_field(from_model: Type[BaseModel], fname: str, annotations: dict[str, ...]):
    annotations[fname] = from_model.__annotations__[fname]


class InputSchema(pa.SchemaModel):
    reporting_date: Series[pa.DateTime] = pa.Field(coerce=True)
    not_inherit: Series[str]

class OutputSchema(pa.SchemaModel):
    test: Series[str] = pa.Field()
    copy_field(InputSchema, "reporting_date", __annotations__)
    # reporting_date: Series[pa.DateTime] = pa.Field(coerce=True)
    # not_inherit: Optional[Series[str]]


data = pd.DataFrame({
    'reporting_date': ['2023-01-11', '2023-01-12'],
    'not_inherit': ['a','a']
})

@pa.check_types
def func(
    inputs: DataFrame[InputSchema]
) -> DataFrame[OutputSchema]:
    inputs = inputs.drop(columns=['not_inherit'])
    inputs['test'] = 'a'
    return inputs

func(data)
Answered By: Konstantin
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.