Error to load pickle file in Apache Airflow

Question:

all!
Could you please help me to load the serialization file in python to repoduce it in Airflow:

My code:

path = r'/Models/APP/model.pkl'   
with open(path, 'rb') as f:
    g = pickle.load(f)

def my_fucn(gg):
    return gg.predict([[30, 40, 50, 60]])

default_args = {
    'owner': "timur",
    'retry_delay': datetime.timedelta(minutes=5),
    }
DAG_ID = "pythonoperator_test_v02"
dag_python = DAG(
    dag_id=DAG_ID,
    default_args=default_args,
    schedule_interval='@hourly',
    dagrun_timeout=datetime.timedelta(minutes=60),
    start_date=days_ago(0)
    )

empty_task = EmptyOperator(task_id="empty_task", retries=3, dag=dag_python)
python_task = PythonOperator(task_id="python_task", python_callable=functools.partial(my_fucn, gg=g), dag=dag_python)

Error:

  File "/home/timur/.local/lib/python3.8/site-packages/airflow/utils/json.py", line 153, in default
    CLASSNAME: o.__module__ + "." + o.__class__.__qualname__,
AttributeError: 'numpy.ndarray' object has no attribute '__module__'
Asked By: Timur Galiev

||

Answers:

One solution to this issue is to use the joblib module to serialize your model instead of pickle:

import joblib
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.utils.dates import days_ago

def my_func(model, x):
    return model.predict([x])

model_path = '/Models/APP/model.joblib'
model = joblib.load(model_path)

default_args = {
    'owner': "timur",
    'retry_delay': datetime.timedelta(minutes=5),
}

dag = DAG(
    dag_id="pythonoperator_test_v02",
    default_args=default_args,
    schedule_interval='@hourly',
    dagrun_timeout=datetime.timedelta(minutes=60),
    start_date=days_ago(0)
)

empty_task = EmptyOperator(task_id="empty_task", retries=3, dag=dag)
python_task = PythonOperator(
    task_id="python_task",
    python_callable=my_func,
    op_kwargs={'model': model, 'x': [30, 40, 50, 60]},
    dag=dag
)

Use the joblib module to load the model from the model.joblib file. We also define a new function my_func that takes the loaded model and a set of input features, and returns the predicted value. We then pass this function and its arguments to the PythonOperator using the op_kwargs parameter.

Answered By: Konstantinos K.
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.