pipeline for RandomOversampler, RandomForestClassifier & GridSearchCV
Question:
I am working on a binary text classification problem. As the classes are highly imbalanced, I am using sampling techniques like RandomOversampler()
. Then for classification I would use RandomForestClassifier()
whose parameters need to be tuned using GridSearchCV()
.
I am trying to create a pipeline to do these in order but failed so far. It throws invalid parameters
.
param_grid = {
'n_estimators': [5, 10, 15, 20],
'max_depth': [2, 5, 7, 9]
}
grid_pipe = make_pipeline(RandomOverSampler(),RandomForestClassifier())
grid_searcher = GridSearchCV(grid_pipe,param_grid,cv=10)
grid_searcher.fit(tfidf_train[predictors],tfidf_train[target])
Answers:
The parameters you defined in the params
is for RandomForestClassifier, but in the gridSearchCV, you are not passing a RandomForestClassifier
object.
You are passing a pipeline object, for which you have to rename the parameters to access the internal RandomForestClassifier object.
Change them to:
param_grid = {
'randomforestclassifier__n_estimators': [5, 10, 15, 20],
'randomforestclassifier__max_depth': [2, 5, 7, 9]
}
And it will work.
Thanks for A2A. Ideally the parameters are defined as follows:
- Create a pipeline for the transformers to be applied on the data
pipeline = make_pipeline([('variable initialization
1',transformers1()),('variable initialization 2',transformers2()),]
Note: Do not forget to end the pipeline with a ‘,’ before closing off square brackets
eg:pipeline =
make_pipeline([('random_over_sampler',RandomOverSampler()),('RandomForestClassifier',
RandomForestClassifier()),]
- Create a parameter grid
param_grid = {'transformations/algorithm'__'parameter_in_transformations/algorithm':[parameters]}
eg: param_grid = {RandomOverSampler__sampling_strategy:['auto']}
I am working on a binary text classification problem. As the classes are highly imbalanced, I am using sampling techniques like RandomOversampler()
. Then for classification I would use RandomForestClassifier()
whose parameters need to be tuned using GridSearchCV()
.
I am trying to create a pipeline to do these in order but failed so far. It throws invalid parameters
.
param_grid = {
'n_estimators': [5, 10, 15, 20],
'max_depth': [2, 5, 7, 9]
}
grid_pipe = make_pipeline(RandomOverSampler(),RandomForestClassifier())
grid_searcher = GridSearchCV(grid_pipe,param_grid,cv=10)
grid_searcher.fit(tfidf_train[predictors],tfidf_train[target])
The parameters you defined in the params
is for RandomForestClassifier, but in the gridSearchCV, you are not passing a RandomForestClassifier
object.
You are passing a pipeline object, for which you have to rename the parameters to access the internal RandomForestClassifier object.
Change them to:
param_grid = {
'randomforestclassifier__n_estimators': [5, 10, 15, 20],
'randomforestclassifier__max_depth': [2, 5, 7, 9]
}
And it will work.
Thanks for A2A. Ideally the parameters are defined as follows:
- Create a pipeline for the transformers to be applied on the data
pipeline = make_pipeline([('variable initialization
1',transformers1()),('variable initialization 2',transformers2()),]
Note: Do not forget to end the pipeline with a ‘,’ before closing off square brackets
eg:pipeline =
make_pipeline([('random_over_sampler',RandomOverSampler()),('RandomForestClassifier',
RandomForestClassifier()),]
- Create a parameter grid
param_grid = {'transformations/algorithm'__'parameter_in_transformations/algorithm':[parameters]} eg: param_grid = {RandomOverSampler__sampling_strategy:['auto']}