How can I save more metadata on an MLFlow model

Question:

I am trying to save a model to MLFlow, but as I have a custom prediction pipeline to retrieve data, I need to save extra metadata into the model.

I tried using my custom signature class, which It does the job correctly and saves the model with the extra metadata in the MLModel file (YAML format). But when want to load the model from the MLFlow registry, the signature is not easy accesible.

mlflow.sklearn.log_model(model, "model", signature = signature)

I’ve also tried to save an extra dictionary at the log_model function, but it saves it in the conda.yaml file:

mlflow.sklearn.log_model(model, "model", {"metadata1":"value1", "metadata2":"value2"})

Should I make my own flavour? Or my own Model inheritance? I’ve seen here that the PyFuncModel recieves some metadata class and an implementation to solve this, but I don’t know where should I pass my own implementations to PyFuncModel on an experiment script. Here’s a minimal example:

import mlflow
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression

metadata_dic = {"metadata1": "value1", 
                "metadata2": "value2"}

X = np.array([[-2, -1, 0, 1, 2, 1],[-2, -1, 0, 1, 2, 1]]).T
y = np.array([0, 0, 1, 1, 1, 0])

X = pd.DataFrame(X, columns=["X1", "X2"])
y = pd.DataFrame(y, columns=["y"])


model = LogisticRegression()
model.fit(X, y)

mlflow.sklearn.log_model(model, "model")
Asked By: Angelo

||

Answers:

Finally, I made a class that contains every metadata and saved it as an model argument:

model = LogisticRegression()
model.fit(X, y)
model.metadata = ModelMetadata(**metadata_dic)
mlflow.sklearn.log_model(model, "model")

Here I lost the customizable predict process, but after reading the MLFlow documentation is not very clear how to proceed.

If anyone finds a good approach It would be very appreciated.

Answered By: Angelo

A collection of artifacts that a PythonModel can use when performing inference. PythonModelContext objects are created implicitly by the save_model() and log_model() persistence methods, using the contents specified by the artifacts parameter of these methods.

propertyartifacts

A dictionary containing <name, artifact_path> entries, where
artifact_path is an absolute filesystem path to the artifact.

Answered By: Andre

After you load the model by:

loaded_model = mlflow.pyfunc.load_model(path_to_model)

You can access your metadata by:

loaded_model._model_impl.metadata 

However, it only works with the model logged by

mlflow.sklearn.log_model(model, "model")

not working with:

mlflow.statesmodel.log_model(model, "model")
Answered By: Y. Zhang
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.