Python: Calling Inherited Parent Class Method Fails

Question:

I created a pass-through wrapper class around an existing class from sklearn and it does not behave as expected:

import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

tiny_df = pd.DataFrame({'x': ['a', 'b']})

class Foo(OrdinalEncoder):

    def __init__(self, *args, **kwargs):
        super().__init__(self, *args, **kwargs)

    def fit(self, X, y=None):
        super().fit(X, y)
        return self


oe = OrdinalEncoder()
oe.fit(tiny_df) # works fine
foo = Foo()
foo.fit(tiny_df) # fails

The relevant part of the error message I receive is:

~.condaenvspytorchlibsite-packagessklearnpreprocessing_encoders.py in _fit(self, X, handle_unknown)
     69                         raise ValueError("Unsorted categories are not "
     70                                          "supported for numerical categories")
---> 71             if len(self._categories) != n_features:
     72                 raise ValueError("Shape mismatch: if n_values is an array,"
     73                                  " it has to be of shape (n_features,).")

TypeError: object of type 'Foo' has no len()

Somehow parent’s private property _categories does not seem to get set, even though I’ve called the parent constructor in the __init__() method of my class. I must be missing something simple here, and would appreciate any help!

Asked By: kgolyaev

||

Answers:

You don’t have to pass self again to the super function. And scikit-learn‘s estimators should always specify their parameters in the signature of their __init__ and no varargs are allowed else you will get a RUNTIMEERROR, so you have to remove it. I have modified your code as below:

import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

tiny_df = pd.DataFrame({'x': ['a', 'b']})

class Foo(OrdinalEncoder):

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def fit(self, X, y=None):
        super().fit(X, y)
        return self


oe = OrdinalEncoder()
oe.fit(tiny_df) # works fine
foo = Foo()
foo.fit(tiny_df) # works fine too

SAMPLE OUTPUT

foo.transform(tiny_df)
array([[0.],
       [1.]])

A little extra

class Foo(OrdinalEncoder):

    def __init__(self, *args, **kwargs):
        super().__init__(*args,**kwargs)

    def fit(self, X, y=None):
        super().fit(X, y)
        return self

And when you create Foo:

foo= Foo()

RuntimeError: scikit-learn estimators should always specify their parameters in the signature of their __init__ (no varargs). <class '__main__.Foo'> with constructor (self, *args, **kwargs) doesn't  follow this convention.
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.