Evaluate multiple scores on sklearn cross_val_score

Question:

I’m trying to evaluate multiple machine learning algorithms with sklearn for a couple of metrics (accuracy, recall, precision and maybe more).

For what I understood from the documentation here and from the source code(I’m using sklearn 0.17), the cross_val_score function only receives one scorer for each execution. So for calculating multiple scores, I have to :

  1. Execute multiple times
  2. Implement my (time consuming and error prone) scorer

    I’ve executed multiple times with this code :

    from sklearn.svm import SVC
    from sklearn.naive_bayes import GaussianNB
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.cross_validation import  cross_val_score
    import time
    from sklearn.datasets import  load_iris
    
    iris = load_iris()
    
    models = [GaussianNB(), DecisionTreeClassifier(), SVC()]
    names = ["Naive Bayes", "Decision Tree", "SVM"]
    for model, name in zip(models, names):
        print name
        start = time.time()
        for score in ["accuracy", "precision", "recall"]:
            print score,
            print " : ",
            print cross_val_score(model, iris.data, iris.target,scoring=score, cv=10).mean()
        print time.time() - start
    

And I get this output:

Naive Bayes
accuracy  :  0.953333333333
precision  :  0.962698412698
recall  :  0.953333333333
0.0383198261261
Decision Tree
accuracy  :  0.953333333333
precision  :  0.958888888889
recall  :  0.953333333333
0.0494720935822
SVM
accuracy  :  0.98
precision  :  0.983333333333
recall  :  0.98
0.063080072403

Which is ok, but it’s slow for my own data. How can I measure all scores ?

Asked By: Cristiano Araujo

||

Answers:

Since the time of writing this post scikit-learn has updated and made my answer obsolete, see the much cleaner solution below


You can write your own scoring function to capture all three pieces of information, however a scoring function for cross validation must only return a single number in scikit-learn (this is likely for compatibility reasons). Below is an example where each of the scores for each cross validation slice prints to the console, and the returned value is just the sum of the three metrics. If you want to return all these values, you’re going to have to make some changes to cross_val_score (line 1351 of cross_validation.py) and _score (line 1601 or the same file).

from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.cross_validation import  cross_val_score
import time
from sklearn.datasets import  load_iris
from sklearn.metrics import accuracy_score, precision_score, recall_score

iris = load_iris()

models = [GaussianNB(), DecisionTreeClassifier(), SVC()]
names = ["Naive Bayes", "Decision Tree", "SVM"]

def getScores(estimator, x, y):
    yPred = estimator.predict(x)
    return (accuracy_score(y, yPred), 
            precision_score(y, yPred, pos_label=3, average='macro'), 
            recall_score(y, yPred, pos_label=3, average='macro'))

def my_scorer(estimator, x, y):
    a, p, r = getScores(estimator, x, y)
    print a, p, r
    return a+p+r

for model, name in zip(models, names):
    print name
    start = time.time()
    m = cross_val_score(model, iris.data, iris.target,scoring=my_scorer, cv=10).mean()
    print 'nSum:',m, 'nn'
    print 'time', time.time() - start, 'nn'

Which gives:

Naive Bayes
0.933333333333 0.944444444444 0.933333333333
0.933333333333 0.944444444444 0.933333333333
1.0 1.0 1.0
0.933333333333 0.944444444444 0.933333333333
0.933333333333 0.944444444444 0.933333333333
0.933333333333 0.944444444444 0.933333333333
0.866666666667 0.904761904762 0.866666666667
1.0 1.0 1.0
1.0 1.0 1.0
1.0 1.0 1.0

Sum: 2.86936507937 


time 0.0249638557434 


Decision Tree
1.0 1.0 1.0
0.933333333333 0.944444444444 0.933333333333
1.0 1.0 1.0
0.933333333333 0.944444444444 0.933333333333
0.933333333333 0.944444444444 0.933333333333
0.866666666667 0.866666666667 0.866666666667
0.933333333333 0.944444444444 0.933333333333
0.933333333333 0.944444444444 0.933333333333
1.0 1.0 1.0
1.0 1.0 1.0

Sum: 2.86555555556 


time 0.0237860679626 


SVM
1.0 1.0 1.0
0.933333333333 0.944444444444 0.933333333333
1.0 1.0 1.0
1.0 1.0 1.0
1.0 1.0 1.0
0.933333333333 0.944444444444 0.933333333333
0.933333333333 0.944444444444 0.933333333333
1.0 1.0 1.0
1.0 1.0 1.0
1.0 1.0 1.0

Sum: 2.94333333333 


time 0.043044090271 

As of scikit-learn 0.19.0 the solution becomes much easier

from sklearn.model_selection import cross_validate
from sklearn.datasets import  load_iris
from sklearn.svm import SVC

iris = load_iris()
clf = SVC()
scoring = {'acc': 'accuracy',
           'prec_macro': 'precision_macro',
           'rec_micro': 'recall_macro'}
scores = cross_validate(clf, iris.data, iris.target, scoring=scoring,
                         cv=5, return_train_score=True)
print(scores.keys())
print(scores['test_acc'])  

Which gives:

['test_acc', 'score_time', 'train_acc', 'fit_time', 'test_rec_micro', 'train_rec_micro', 'train_prec_macro', 'test_prec_macro']
[ 0.96666667  1.          0.96666667  0.96666667  1.        ]
Answered By: piman314

I ran over the same problem and I created a module that can support multiple metrics in cross_val_score.
In order to accomplish what you want with this module, you can write:

from multiscorer import MultiScorer
import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score          
from sklearn.model_selection import cross_val_score
from numpy import average

scorer = MultiScorer({
    'Accuracy'  : (accuracy_score , {}),
    'Precision' : (precision_score, {'pos_label': 3, 'average':'macro'}),
    'Recall'    : (recall_score   , {'pos_label': 3, 'average':'macro'})
})

for model, name in zip(models, names):
    print name
    start = time.time()

    _ = cross_val_score(model, iris.data, iris.target,scoring=scorer, cv=10) # Added assignment of the result to `_` in order to illustrate that the return value will not be used
    results = scorer.get_results()

    for metric_name in results.keys():
        average_score = np.average(results[metric_name])
        print('%s : %f' % (metric_name, average_score))

    print 'time', time.time() - start, 'nn'

You can check and download this module from GitHub.
Hope it helps.

Answered By: kyriakosSt
from sklearn import model_selection

def error_metrics(model, train_data, train_targ, kfold):
    scoring = ["accuracy","roc_auc","neg_log_loss","r2",
             "neg_mean_squared_error","neg_mean_absolute_error"] 

    error_metrics = pd.DataFrame()
    error_metrics["model"] = model
    for scor in scoring:
        score = []
        for mod in model:
           
            result = model_selection.cross_val_score(estimator= mod, X=train_data, y=train_targ,cv=kfold,scoring=scor )
            score.append(result.mean())
            
        error_metrics[scor] =pd.Series(score)
        
    return error_metrics
Answered By: Muluwork Shegaw

Update Apr 2023 for scikit-learn >= 0.19.0:

Since cross_val_score method changes a little bit from past versions
As document said:

Use cross_validate To run cross-validation on multiple metrics and also to return train scores, fit times and score times

from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate
from pprint import pprint

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Create models
models = [GaussianNB(), DecisionTreeClassifier(), SVC()]
names = ["Naive Bayes", "Decision Tree", "SVM"]


# Define custom scoring metrics
scoring = {
    'accuracy': make_scorer(accuracy_score),
    'precision': make_scorer(precision_score, average='weighted'),
    'recall': make_scorer(recall_score, average='weighted'),
    'f1_score': make_scorer(f1_score, average='macro')
}

for model, name in zip(models, names):
    # Perform 5-fold cross-validation with custom scoring metrics
    pprint(name)
    pprint(cross_validate(model, X, y, cv=5, scoring=scoring))

and results look like this:

'Naive Bayes'
{'fit_time': array([0.0012691 , 0.00121498, 0.00065112, 0.000664  , 0.000741  ]),
 'score_time': array([0.00356674, 0.00290489, 0.00418973, 0.00282598, 0.00429797]),
 'test_accuracy': array([0.93333333, 0.96666667, 0.93333333, 0.93333333, 1.        ]),
 'test_f1_score': array([0.93333333, 0.96658312, 0.93265993, 0.93265993, 1.        ]),
 'test_precision': array([0.93333333, 0.96969697, 0.94444444, 0.94444444, 1.        ]),
 'test_recall': array([0.93333333, 0.96666667, 0.93333333, 0.93333333, 1.        ])}
'Decision Tree'
{'fit_time': array([0.0010047 , 0.00049806, 0.00131512, 0.00049615, 0.00048304]),
 'score_time': array([0.003232  , 0.00246   , 0.00605106, 0.00245786, 0.00233197]),
 'test_accuracy': array([0.96666667, 0.96666667, 0.9       , 0.96666667, 1.        ]),
 'test_f1_score': array([0.96658312, 0.96658312, 0.89974937, 0.96658312, 1.        ]),
 'test_precision': array([0.96969697, 0.96969697, 0.9023569 , 0.96969697, 1.        ]),
 'test_recall': array([0.96666667, 0.96666667, 0.9       , 0.96666667, 1.        ])}
'SVM'
{'fit_time': array([0.00082183, 0.00068903, 0.0019362 , 0.00088406, 0.00114012]),
 'score_time': array([0.00263715, 0.00342083, 0.00375986, 0.00331903, 0.00372481]),
 'test_accuracy': array([0.96666667, 0.96666667, 0.96666667, 0.93333333, 1.        ]),
 'test_f1_score': array([0.96658312, 0.96658312, 0.96658312, 0.93333333, 1.        ]),
 'test_precision': array([0.96969697, 0.96969697, 0.96969697, 0.93333333, 1.        ]),
 'test_recall': array([0.96666667, 0.96666667, 0.96666667, 0.93333333, 1.        ])}
Answered By: moraei