Classification Model's parameters produce different results
Question:
I’m working on SVC model for classification and I faced different accuracy result in each time I changed the values of the parameters (svc__gamma, svc__kernel and svc__C), I read the documentation of Sklearn but I could not understand what those parameters mean, I have Three questions :
- What did those parameters indicate to?
- How its effect Accuracy each time I change it?
- What is the correct parameter values?
the result of accuracy is 0.70, but when I delete svc__gamma and svc__C , the result increases up to 0.76.
pipe = make_pipeline(TfidfVectorizer(),
SVC())
param_grid = {'svc__kernel': ['rbf', 'linear', 'poly'],
'svc__gamma': [0.1, 1, 10, 100],
'svc__C': [0.1, 1, 10, 100]}
svc_model = GridSearchCV(pipe, param_grid, cv=3)
svc_model.fit(X_train, Y_train)
prediction = svc_model.predict(X_test)
print(f"Accuracy score is {accuracy_score(Y_test, prediction):.2f}")
print(classification_report(Y_test, prediction))
Answers:
to 1.
- gamma is a parameter of the gaussian bell curve, so it should only
affect the RBF( Gaussian Kernel)
- C is the paramter of the optimization problem, the inverse of the Lagrangian multiplier
to. 2.
- get familiar with the mathematical background to fully understand how they affect your accuracy (sidenote: Accuracy is usuallly no reliable measure, but depends on context)
to 3.
- there are no ‘correct’ parameters. They depend on the context, data and the goal you want to achive. Usually there is a tradeoff between how good the algorithm works on test data and how it works on new data ( overfitting vs. underfitting)
I hope that helps as a first step 🙂
for further information I suggest SVM.
I’m working on SVC model for classification and I faced different accuracy result in each time I changed the values of the parameters (svc__gamma, svc__kernel and svc__C), I read the documentation of Sklearn but I could not understand what those parameters mean, I have Three questions :
- What did those parameters indicate to?
- How its effect Accuracy each time I change it?
- What is the correct parameter values?
the result of accuracy is 0.70, but when I delete svc__gamma and svc__C , the result increases up to 0.76.
pipe = make_pipeline(TfidfVectorizer(),
SVC())
param_grid = {'svc__kernel': ['rbf', 'linear', 'poly'],
'svc__gamma': [0.1, 1, 10, 100],
'svc__C': [0.1, 1, 10, 100]}
svc_model = GridSearchCV(pipe, param_grid, cv=3)
svc_model.fit(X_train, Y_train)
prediction = svc_model.predict(X_test)
print(f"Accuracy score is {accuracy_score(Y_test, prediction):.2f}")
print(classification_report(Y_test, prediction))
to 1.
- gamma is a parameter of the gaussian bell curve, so it should only
affect the RBF( Gaussian Kernel) - C is the paramter of the optimization problem, the inverse of the Lagrangian multiplier
to. 2.
- get familiar with the mathematical background to fully understand how they affect your accuracy (sidenote: Accuracy is usuallly no reliable measure, but depends on context)
to 3.
- there are no ‘correct’ parameters. They depend on the context, data and the goal you want to achive. Usually there is a tradeoff between how good the algorithm works on test data and how it works on new data ( overfitting vs. underfitting)
I hope that helps as a first step 🙂
for further information I suggest SVM.