cross-validation

Error during Recursive feature elimination using Histogram based GBM

Error during Recursive feature elimination using Histogram based GBM Question: I am implementing Recursive Feature Elimination using the HistGradientBoostingClassifier, but for some reason keeps on getting the following error: ValueError: when importance_getter==’auto’, the underlying estimator HistGradientBoostingClassifier should have coef_ or feature_importances_ attribute. Either pass a fitted estimator to feature selector or call fit before calling …

Total answers: 1

What is the best practice to apply cross-validation using TimeSeriesSplit() over dataframe within end-2-end pipeline in python?

What is the best practice to apply cross-validation using TimeSeriesSplit() over dataframe within end-2-end pipeline in python? Question: Let’s say I have dataset within the following pandas dataframe format with a non-standard timestamp column without datetime format as follows: +——–+—–+ |TS_24hrs|count| +——–+—–+ |0 |157 | |1 |334 | |2 |176 | |3 |86 | |4 …

Total answers: 2

NotFittedError (instance is not fitted yet) after invoked cross_validate

NotFittedError (instance is not fitted yet) after invoked cross_validate Question: This is my minimal reproducible example: x = np.array([ [1, 2], [3, 4], [5, 6], [6, 7] ]) y = [1, 0, 0, 1] model = GaussianNB() scores = cross_validate(model, x, y, cv=2, scoring=("accuracy")) model.predict([8,9]) What I intended to do is instantiating a Gaussian Naive …

Total answers: 1

continuous data, Y response not support in the cross_val_score() binary|multiclass for IterativeImputer for BayesianRidge

continuous data, Y response not support in the cross_val_score() binary|multiclass for IterativeImputer for BayesianRidge Question: Problem Defined, Continuous Challenge This new imputer_bayesian_ridge() function is for Iterative Imputer to impute training data. Sending in data frame training data, then immediately get data.values for numpy array variable. This send or passes a training data with many features, …

Total answers: 1

Error: "Boolean array expected for the condition, not float64" during StratifiedK-fold

Error: "Boolean array expected for the condition, not float64" during StratifiedK-fold Question: i’m trying to use the stratifid k- fold for cross validation on my dataset but there is the error "Boolean array expected for the condition, not float64" (in the heading code below). Does anyone know the reason? This is the code: import pandas …

Total answers: 1

Why sklearn's KFold can only be enumerated once (also on using it in xgboost.cv)?

Why sklearn's KFold can only be enumerated once (also on using it in xgboost.cv)? Question: Trying to create a KFold object for my xgboost.cv, and I have import pandas as pd from sklearn.model_selection import KFold df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10]]) KF = KFold(n_splits=2) kf = KF.split(df) But it seems I can only enumerate once: for i, (train_index, …

Total answers: 1

NLP neural net validation accuracy increases too much (?) between folds in cross validation

NLP neural net validation accuracy increases too much (?) between folds in cross validation Question: I’m training a model with BERT for classification with two labels. I’d like to use cross validation, as I want to get an out of sample prediction of each observations in the data set to use later in linear regressions. …

Total answers: 1

Why is cross_val_score not producing consistent results?

Why is cross_val_score not producing consistent results? Question: When this code executes the results are not consistent. Where is the randomness coming from? from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA from sklearn.tree import DecisionTreeClassifier from sklearn.pipeline import Pipeline from sklearn.model_selection import KFold from sklearn.model_selection import cross_val_score …

Total answers: 1

Hurdle models – gridsearchCV

Hurdle models – gridsearchCV Question: I am currently trying to build a hurdle model – zero inflated regressor to predict the revenue from each of out customers. We use zero inflated regressor because most (80%) of our customers have 0 as revenue and only 20% have revenue > 0. So, we build two models like …

Total answers: 1

Apply a cross validated ML model to unseen data

Apply a cross validated ML model to unseen data Question: I would like to use scikit learn to predict with X a variable y. I would like to train a classifier on a training dataset using cross validation and then to apply this classifier to an unseen test dataset (as in https://www.nature.com/articles/s41586-022-04492-9) from sklearn import …

Total answers: 1