Sklearn-classifier, issue with freez (pod pending in K8s)

Question:

I got freez of Sklearn-classifier in MLRun (the job is still running after 5, 10, 20, … minutes), see log output:

2023-02-21 13:50:15,853 [info] starting run training uid=e8e66defd91043dda62ae8b6795c74ea DB=http://mlrun-api:8080
2023-02-21 13:50:16,136 [info] Job is running in the background, pod: training-tgplm

see freez/pending issue on Web UI:

enter image description here

I used this source code and classifier_fn.run(train_task, local=False) generates freez:

# Import the Sklearn classifier function from the function hub
classifier_fn = mlrun.import_function('hub://sklearn-classifier')

# Prepare the parameters list for the training function
training_params = {"model_name": ['risk_xgboost'],              
              "model_pkg_class": ['sklearn.ensemble.GradientBoostingClassifier']}

# Define the training task, including the feature vector, label and hyperparams definitions
train_task = mlrun.new_task('training', 
                      inputs={'dataset': transactions_fv.uri},
                      params={'label_column': 'n4_pd30'}
                     )

train_task.with_hyper_params(training_params, strategy='list', selector='max.accuracy')

# Specify the cluster image
classifier_fn.spec.image = 'mlrun/mlrun'

# Run training
classifier_fn.run(train_task, local=False)

Did you have and solve the same issue?

Asked By: JIST

||

Answers:

I solved the same issue and the problem was with different MLRun version between client side and server side. I had MLRun on client in version 1.2.1rc2 and server side in version 1.2.1 (these versions have different interfaces and it generates freez issue).

Please, synch MLRun versions between client and server and it will works.

BTW: Your part of code seems as this original sample here https://docs.mlrun.org/en/stable/feature-store/end-to-end-demo/02-create-training-model.html

Answered By: JzD