cross-validation

Sklearn StratifiedKFold: ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead

Sklearn StratifiedKFold: ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead Question: Working with Sklearn stratified kfold split, and when I attempt to split using multi-class, I received on error (see below). When I tried and split using binary, it works no problem. num_classes = len(np.unique(y_train)) y_train_categorical = keras.utils.to_categorical(y_train, num_classes) kf=StratifiedKFold(n_splits=5, shuffle=True, random_state=999) # …

Total answers: 6

How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation?

How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation? Question: I have an imbalanced dataset containing a binary classification problem. I have built Random Forest Classifier and used k-fold cross-validation with 10 folds. kfold = model_selection.KFold(n_splits=10, random_state=42) model=RandomForestClassifier(n_estimators=50) I got the results of the 10 folds results = …

Total answers: 2

module 'sklearn' has no attribute 'cross_validation'

module 'sklearn' has no attribute 'cross_validation' Question: I am trying to split my dataset into training and testing dataset, but I am getting this error: X_train,X_test,Y_train,Y_test = sklearn.cross_validation.train_test_split(X,df1[‘ENTRIESn_hourly’]) AttributeError Traceback (most recent call last) <ipython-input-53-5445dab94861> in <module>() —-> 1 X_train,X_test,Y_train,Y_test = sklearn.cross_validation.train_test_split(X,df1[‘ENTRIESn_hourly’]) AttributeError: module ‘sklearn’ has no attribute ‘cross_validation’ How can I handle this? Asked …

Total answers: 6

difference between StratifiedKFold and StratifiedShuffleSplit in sklearn

difference between StratifiedKFold and StratifiedShuffleSplit in sklearn Question: As from the title I am wondering what is the difference between StratifiedKFold with the parameter shuffle=True StratifiedKFold(n_splits=10, shuffle=True, random_state=0) and StratifiedShuffleSplit StratifiedShuffleSplit(n_splits=10, test_size=’default’, train_size=None, random_state=0) and what is the advantage of using StratifiedShuffleSplit Asked By: gabboshow || Source Answers: In stratKFolds, each test set should not …

Total answers: 3

Python – LightGBM with GridSearchCV, is running forever

Python – LightGBM with GridSearchCV, is running forever Question: Recently, I am doing multiple experiments to compare Python XgBoost and LightGBM. It seems that this LightGBM is a new algorithm that people say it works better than XGBoost in both speed and accuracy. This is LightGBM GitHub. This is LightGBM python API documents, here you …

Total answers: 2

How to standardize data with sklearn's cross_val_score()

How to standardize data with sklearn's cross_val_score() Question: Let’s say I want to use a LinearSVC to perform k-fold-cross-validation on a dataset. How would I perform standardization on the data? The best practice I have read is to build your standardization model on your training data then apply this model to the testing data. When …

Total answers: 1

return coefficients from Pipeline object in sklearn

return coefficients from Pipeline object in sklearn Question: I’ve fit a Pipeline object with RandomizedSearchCV pipe_sgd = Pipeline([(‘scl’, StandardScaler()), (‘clf’, SGDClassifier(n_jobs=-1))]) param_dist_sgd = {‘clf__loss’: [‘log’], ‘clf__penalty’: [None, ‘l1’, ‘l2’, ‘elasticnet’], ‘clf__alpha’: np.linspace(0.15, 0.35), ‘clf__n_iter’: [3, 5, 7]} sgd_randomized_pipe = RandomizedSearchCV(estimator = pipe_sgd, param_distributions=param_dist_sgd, cv=3, n_iter=30, n_jobs=-1) sgd_randomized_pipe.fit(X_train, y_train) I want to access the coef_ attribute …

Total answers: 3

Difference between cross_val_score and cross_val_predict

Difference between cross_val_score and cross_val_predict Question: I want to evaluate a regression model build with scikitlearn using cross-validation and getting confused, which of the two functions cross_val_score and cross_val_predict I should use. One option would be : cvs = DecisionTreeRegressor(max_depth = depth) scores = cross_val_score(cvs, predictors, target, cv=cvfolds, scoring=’r2′) print(“R2-Score: %0.2f (+/- %0.2f)” % (scores.mean(), …

Total answers: 3

Logistic regression and cross-validation

Logistic regression and cross-validation Question: I am trying to solve a classification problem on a given dataset, through logistic regression (and this is not the problem). To avoid overfitting I’m trying to implement it through cross-validation (and here’s the problem): there’s something that I’m missing to complete the program. My purpose here is to determine …

Total answers: 2

How is scikit-learn cross_val_predict accuracy score calculated?

How is scikit-learn cross_val_predict accuracy score calculated? Question: Does the cross_val_predict (see doc, v0.18) with k-fold method as shown in the code below calculate accuracy for each fold and average them finally or not? cv = KFold(len(labels), n_folds=20) clf = SVC() ypred = cross_val_predict(clf, td, labels, cv=cv) accuracy = accuracy_score(labels, ypred) print accuracy Asked By: …

Total answers: 4