Getting "TypeError: type of out argument not recognized: <class 'str'>" when using class function with Pandera decorator

Question:

I am trying to get to use decorators from Python package "Pandera" and I am having trouble to get them work with classes.

First I create schemas for Pandera:

from pandera import Column, Check
import yaml
in_ = pa.DataFrameSchema(
    {
        "Name": Column(object, nullable=True),
        "Height": Column(object, nullable=True),
    })

with open("./in_.yml", "w") as file:
    yaml.dump(in_, file)

out_ = pa.DataFrameSchema(
    {
        "Name": Column(object, nullable=True),
        "Height": Column(object, nullable=True),
    })
with open("./out_.yml", "w") as file:
    yaml.dump(out_, file)

Next I create test.py file with class:

from pandera import check_io
import pandas as pd

class TransformClass():

    with open("./in_.yml", "r") as file:
        in_ = file.read()
    with open("./out_.yml", "r") as file:
        out_ = file.read()

    @staticmethod
    @check_io(df=in_, out=out_)
    def func(df: pd.DataFrame) -> pd.DataFrame:
        return df

Finally I importing this class:

from test import TransformClass
data = {'Name': [np.nan, 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
df = pd.DataFrame(data)
TransformClass.func(df)

I am getting:

File C:Anaconda3envspy310libsite-packagespanderadecorators.py:464, in check_io.<locals>._wrapper(fn, instance, args, kwargs)
    462     out_schemas = []
    463 else:
--> 464     raise TypeError(
    465         f"type of out argument not recognized: {type(out)}"
    466     )
    468 wrapped_fn = fn
    469 for input_getter, input_schema in inputs.items():
    470     # pylint: disable=no-value-for-parameter

TypeError: type of out argument not recognized: <class 'str'>

Any help would much appreciated

Asked By: illuminato

||

Answers:

The check_io decorator expects arguments of type pandera.DataFrameSchema. However, it is being passed _out which is type str since it is the output of file.read().

The Pandera docs explain which types the check_io decorator is expecting.

A solution would be to pass the output of the file.read() line to the Pandera constructor, possibly with some transformation:

out_ = yaml.safe_load(file.read())
Answered By: grbeazley

Thanks to @grbeazley here is the full solution:

from pandera import Column, Check
import yaml

in_ = pa.DataFrameSchema(
    {
        "Name": Column(object, nullable=True),
        "Height": Column(object, nullable=True),
    })
with open("in_.yml", "w") as file:
    yaml.dump(in_.to_yaml(), file)

with open("./in_.yml", "r") as file:
    in_ = yaml.safe_load(file.read())

_ = pa.DataFrameSchema.from_yaml(in_)
Answered By: illuminato
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.