Error to load pickle file in Apache Airflow
Question:
all!
Could you please help me to load the serialization file in python to repoduce it in Airflow:
My code:
path = r'/Models/APP/model.pkl'
with open(path, 'rb') as f:
g = pickle.load(f)
def my_fucn(gg):
return gg.predict([[30, 40, 50, 60]])
default_args = {
'owner': "timur",
'retry_delay': datetime.timedelta(minutes=5),
}
DAG_ID = "pythonoperator_test_v02"
dag_python = DAG(
dag_id=DAG_ID,
default_args=default_args,
schedule_interval='@hourly',
dagrun_timeout=datetime.timedelta(minutes=60),
start_date=days_ago(0)
)
empty_task = EmptyOperator(task_id="empty_task", retries=3, dag=dag_python)
python_task = PythonOperator(task_id="python_task", python_callable=functools.partial(my_fucn, gg=g), dag=dag_python)
Error:
File "/home/timur/.local/lib/python3.8/site-packages/airflow/utils/json.py", line 153, in default
CLASSNAME: o.__module__ + "." + o.__class__.__qualname__,
AttributeError: 'numpy.ndarray' object has no attribute '__module__'
Answers:
One solution to this issue is to use the joblib module to serialize your model instead of pickle:
import joblib
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.utils.dates import days_ago
def my_func(model, x):
return model.predict([x])
model_path = '/Models/APP/model.joblib'
model = joblib.load(model_path)
default_args = {
'owner': "timur",
'retry_delay': datetime.timedelta(minutes=5),
}
dag = DAG(
dag_id="pythonoperator_test_v02",
default_args=default_args,
schedule_interval='@hourly',
dagrun_timeout=datetime.timedelta(minutes=60),
start_date=days_ago(0)
)
empty_task = EmptyOperator(task_id="empty_task", retries=3, dag=dag)
python_task = PythonOperator(
task_id="python_task",
python_callable=my_func,
op_kwargs={'model': model, 'x': [30, 40, 50, 60]},
dag=dag
)
Use the joblib module to load the model from the model.joblib file. We also define a new function my_func that takes the loaded model and a set of input features, and returns the predicted value. We then pass this function and its arguments to the PythonOperator using the op_kwargs parameter.
all!
Could you please help me to load the serialization file in python to repoduce it in Airflow:
My code:
path = r'/Models/APP/model.pkl'
with open(path, 'rb') as f:
g = pickle.load(f)
def my_fucn(gg):
return gg.predict([[30, 40, 50, 60]])
default_args = {
'owner': "timur",
'retry_delay': datetime.timedelta(minutes=5),
}
DAG_ID = "pythonoperator_test_v02"
dag_python = DAG(
dag_id=DAG_ID,
default_args=default_args,
schedule_interval='@hourly',
dagrun_timeout=datetime.timedelta(minutes=60),
start_date=days_ago(0)
)
empty_task = EmptyOperator(task_id="empty_task", retries=3, dag=dag_python)
python_task = PythonOperator(task_id="python_task", python_callable=functools.partial(my_fucn, gg=g), dag=dag_python)
Error:
File "/home/timur/.local/lib/python3.8/site-packages/airflow/utils/json.py", line 153, in default
CLASSNAME: o.__module__ + "." + o.__class__.__qualname__,
AttributeError: 'numpy.ndarray' object has no attribute '__module__'
One solution to this issue is to use the joblib module to serialize your model instead of pickle:
import joblib
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.utils.dates import days_ago
def my_func(model, x):
return model.predict([x])
model_path = '/Models/APP/model.joblib'
model = joblib.load(model_path)
default_args = {
'owner': "timur",
'retry_delay': datetime.timedelta(minutes=5),
}
dag = DAG(
dag_id="pythonoperator_test_v02",
default_args=default_args,
schedule_interval='@hourly',
dagrun_timeout=datetime.timedelta(minutes=60),
start_date=days_ago(0)
)
empty_task = EmptyOperator(task_id="empty_task", retries=3, dag=dag)
python_task = PythonOperator(
task_id="python_task",
python_callable=my_func,
op_kwargs={'model': model, 'x': [30, 40, 50, 60]},
dag=dag
)
Use the joblib module to load the model from the model.joblib file. We also define a new function my_func that takes the loaded model and a set of input features, and returns the predicted value. We then pass this function and its arguments to the PythonOperator using the op_kwargs parameter.