Calling the sklearn2pmml() function in Python 3.8 throws RuntimeError

Question:

I’am trying to save my scikit learn logistic regression as pmml but get a RuntimeError:

My code:

from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn.linear_model import LogisticRegression

pipe_pmml = PMMLPipeline(steps=[('mapper', mapper),
                                ('estimator', LogisticRegression(C = 0.01, 
                                  penalty = 'l1', 
                                  solver = 'liblinear', 
                                  random_state = 1))
                                ])
pipe_pmml.fit(X_small, y)

sklearn2pmml(pipe_pmml, pmml_filename, with_repr = True)

with error:

Standard output is empty
Standard error:
Exception in thread "main" net.razorvine.pickle.InvalidOpcodeException: invalid pickle opcode: 0
    at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:366)
    at org.jpmml.python.CustomUnpickler.dispatch(CustomUnpickler.java:31)
    at org.jpmml.python.PickleUtil$1.dispatch(PickleUtil.java:64)
    at net.razorvine.pickle.Unpickler.load(Unpickler.java:109)
    at org.jpmml.python.PickleUtil.unpickle(PickleUtil.java:85)
    at com.sklearn2pmml.Main.run(Main.java:78)
    at com.sklearn2pmml.Main.main(Main.java:6

where mapper is a DataFrameMapper from sklearn_pandas

Anybody any idea?

  • sklearn==0.0
  • scikit-learn==1.1.2
  • sklearn-pandas==2.2.0
  • sklearn2pmml==0.86.3
Asked By: S_Econometrics

||

Answers:

Solution: Downgrade joblib to 1.1.0

see: https://github.com/jpmml/jpmml-python/issues/19

Answered By: S_Econometrics

Joblib 1.2.0 generates pickle-like files, which contain extra padding for array memory alignment purposes: joblib/joblib#563

This extra padding is what is causing standard pickle data format readers to fail.

The SkLearn2PMML package version 0.87.0 (and newer) should be able to deal with both standard (Python pickle, Joblib 1.1.0) and and non-standard (Joblib 1.2.0) pickle files.

Answered By: user1808924