import custom python module in azure ml deployment environment

Question:

I have an sklearn k-means model. I am training the model and saving it in a pickle file so I can deploy it later using azure ml library. The model that I am training uses a custom Feature Encoder called MultiColumnLabelEncoder.
The pipeline model is defined as follow :

# Pipeline
kmeans = KMeans(n_clusters=3, random_state=0)
pipe = Pipeline([
("encoder", MultiColumnLabelEncoder()),
('k-means', kmeans),
])
#Training the pipeline
model = pipe.fit(visitors_df)
prediction = model.predict(visitors_df)
#save the model in pickle/joblib format
filename = 'k_means_model.pkl'
joblib.dump(model, filename)

The model saving works fine. The Deployment steps are the same as the steps in this link :

https://notebooks.azure.com/azureml/projects/azureml-getting-started/html/how-to-use-azureml/deploy-to-cloud/model-register-and-deploy.ipynb

However the deployment always fails with this error :

  File "/var/azureml-server/create_app.py", line 3, in <module>
    from app import main
  File "/var/azureml-server/app.py", line 27, in <module>
    import main as user_main
  File "/var/azureml-app/main.py", line 19, in <module>
    driver_module_spec.loader.exec_module(driver_module)
  File "/structure/azureml-app/score.py", line 22, in <module>
    importlib.import_module("multilabelencoder")
  File "/azureml-envs/azureml_b707e8c15a41fd316cf6c660941cf3d5/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'multilabelencoder'

I understand that pickle/joblib has some problems unpickling the custom function MultiLabelEncoder. That’s why I defined this class in a separate python script (which I executed also). I called this custom function in the training python script, in the deployment script and in the scoring python file (score.py). The importing in the score.py file is not successful.
So my question is how can I import custom python module to azure ml deployment environment ?

Thank you in advance.

EDIT:
This is my .yml file

name: project_environment
dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2

- pip:
  - multilabelencoder==1.0.4
  - scikit-learn
  - azureml-defaults==1.0.74.*
  - pandas
channels:
- conda-forge
Asked By: Emna Jaoua

||

Answers:

In fact, the solution was to import my customized class MultiColumnLabelEncoder as a pip package (You can find it through pip install multilllabelencoder==1.0.5).
Then I passed the pip package to the .yml file or in the InferenceConfig of the azure ml environment.
In the score.py file, I imported the class as follows :

from multilabelencoder import multilabelencoder
def init():
    global model

    # Call the custom encoder to be used dfor unpickling the model
    encoder = multilabelencoder.MultiColumnLabelEncoder() 
    # Get the path where the deployed model can be found.
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'k_means_model_45.pkl')
    model = joblib.load(model_path)

Then the deployment was successful.
One more important thing is I had to use the same pip package (multilabelencoder) in the training pipeline as here :

from multilabelencoder import multilabelencoder 
pipe = Pipeline([
    ("encoder", multilabelencoder.MultiColumnLabelEncoder(columns)),
    ('k-means', kmeans),
])
#Training the pipeline
trainedModel = pipe.fit(df)
Answered By: Emna Jaoua

I am facing the same problem, trying to deploy a model that has dependency on some of my own scripts and got the error message:

 ModuleNotFoundError: No module named 'my-own-module-name'

Found this "Private wheel files" solution in MS documentation and it works. The difference from the solution above is now I do not need to publish my scripts to pip. I think many people might face the same situation that for some reason you cannot or do not want to publish your scripts. Instead, your own wheel file is saved under your own blob storage.

Following the documentation, I did the following steps and it worked for me. Now I can deploy my model that has dependency in my own scripts.

  1. Package your own scripts that the model is dependent on into wheel file, and the wheel file is saved locally.

    "your_path/your-wheel-file-name.whl"

  2. Follow the instructions in the "Private wheel files" solution in MS documentation. Below is the code that worked for me.


from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies

whl_url = Environment.add_private_pip_wheel(workspace=ws,file_path = "your_pathpath/your-wheel-file-name.whl")

myenv = CondaDependencies()
myenv.add_pip_package("scikit-learn==0.22.1")
myenv.add_pip_package("azureml-defaults")
myenv.add_pip_package(whl_url)

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())

My environment file now looks like:

name: project_environment
dependencies:
  # The python interpreter version.

  # Currently Azure ML only supports 3.5.2 and later.

- python=3.6.2

- pip:
  - scikit-learn==0.22.1
  - azureml-defaults
  - https://myworkspaceid.blob.core/azureml/Environment/azureml-private-packages/my-wheel-file-name.whl
channels:
- conda-forge

I’m new to Azure ml. Learning by doing and communicating with the community. This solution works fine for me, hope that it helps.

Answered By: Xxx Lll

An alternative method that works for me is to register a "model_src"-directory containing both the pickled model and a custom module, instead of registering only the pickled model. Then, specify the custom module in the scoring script during deployment, e.g., using python’s os module. Example below using sdk-v1:

Example of "model_src"-directory

model_src
   │
   ├─ utils   # your custom module
   │    └─ multilabelencoder.py
   │
   └─ models  
        ├─ score.py
        └─ k_means_model_45.pkl  # your pickled model file

Register "model_src" in sdk-v1

model = Model.register(model_path="./model_src",
    model_name="kmeans",                          
    description="model registered as a directory",
    workspace=ws
)

Correspondingly, when defining the inference config

deployment_folder = './model_src'
script_file = 'models/score.py'
service_env = Environment.from_conda_specification("kmeans-service",
    './environment.yml'  # wherever yml is located locally
)
inference_config = InferenceConfig(source_directory=deployment_folder,
    entry_script=script_file,
    environment=service_env
)

Content of scoring script, e.g., score.py

# Specify model_src as your parent
import os
deploy_dir = os.path.join(os.getenv('AZUREML_MODEL_DIR'),'model_src')

# Import custom module
import sys
sys.path.append("{0}/utils".format(deploy_dir)) 
from multilabelencoder import MultiColumnLabelEncoder

import joblib

def init():
    global model

    # Call the custom encoder to be used dfor unpickling the model
    encoder = MultiColumnLabelEncoder()  # Use as intended downstream 
    
    # Get the path where the deployed model can be found.
    model = joblib.load('{}/models/k_means_model_45.pkl'.format(deploy_dir))

This method provides flexibility in importing various custom scripts in my scoring script.

Answered By: Jhons D