SVC python output showing the same value of "1" for every C or gamma used

Question

This is the code:

import numpy as np
from sklearn import svm
numere=np.fromfile("sat.trn",dtype=int,count=-1,sep=" ")
numereTest=np.fromfile("sat.tst",dtype=int,count=-1,sep=" ")
numere=numere.reshape(int(len(numere)/37),37)
numereTest=numereTest.reshape(int(len(numereTest)/37),37)
etichete=numere[0:int(len(numere)),36]
eticheteTest=numereTest[0:int(len(numereTest)),36]
numere=np.delete(numere,36,1)
numereTest=np.delete(numereTest,36,1)
clf=svm.SVC(kernel='rbf',C=1,gamma=1)
clf.fit(numere,etichete)
predictie=clf.predict(numereTest)

I took the data from a file that has it all and then I made 2 np.arrays with them, but the output is 1 everything I do.

numere(:10)–>array([[ 92, 115, 120, 94, 84, 102, 106, 79, 84, 102, 102, 83, 101,
126, 133, 103, 92, 112, 118, 85, 84, 103, 104, 81, 102, 126,
134, 104, 88, 121, 128, 100, 84, 107, 113, 87],
[ 84, 102, 106, 79, 84, 102, 102, 83, 80, 102, 102, 79, 92,
112, 118, 85, 84, 103, 104, 81, 84, 99, 104, 78, 88, 121,
128, 100, 84, 107, 113, 87, 84, 99, 104, 79],
[ 84, 102, 102, 83, 80, 102, 102, 79, 84, 94, 102, 79, 84,
103, 104, 81, 84, 99, 104, 78, 84, 99, 104, 81, 84, 107,
113, 87, 84, 99, 104, 79, 84, 99, 104, 79],
[ 80, 102, 102, 79, 84, 94, 102, 79, 80, 94, 98, 76, 84,
99, 104, 78, 84, 99, 104, 81, 76, 99, 104, 81, 84, 99,
104, 79, 84, 99, 104, 79, 84, 103, 104, 79],
[ 84, 94, 102, 79, 80, 94, 98, 76, 80, 102, 102, 79, 84,
99, 104, 81, 76, 99, 104, 81, 76, 99, 108, 85, 84, 99,
104, 79, 84, 103, 104, 79, 79, 107, 109, 87],
[ 80, 94, 98, 76, 80, 102, 102, 79, 76, 102, 102, 79, 76,
99, 104, 81, 76, 99, 108, 85, 76, 103, 118, 88, 84, 103,
104, 79, 79, 107, 109, 87, 79, 107, 109, 87],
[ 76, 102, 106, 83, 76, 102, 106, 87, 80, 98, 106, 79, 80,
107, 118, 88, 80, 112, 118, 88, 80, 107, 113, 85, 79, 107,
113, 87, 79, 103, 104, 83, 79, 103, 104, 79],
[ 76, 102, 106, 87, 80, 98, 106, 79, 76, 94, 102, 76, 80,
112, 118, 88, 80, 107, 113, 85, 80, 95, 100, 78, 79, 103,
104, 83, 79, 103, 104, 79, 79, 95, 100, 79],
[ 76, 89, 98, 76, 76, 94, 98, 76, 76, 98, 102, 72, 80,
95, 104, 74, 76, 91, 104, 74, 76, 95, 100, 78, 75, 91,
96, 75, 75, 91, 96, 71, 79, 87, 93, 71],
[ 76, 94, 98, 76, 76, 98, 102, 72, 76, 94, 90, 76, 76,
91, 104, 74, 76, 95, 100, 78, 76, 91, 100, 74, 75, 91,
96, 71, 79, 87, 93, 71, 79, 87, 93, 67]])

Asked By: Alexandru-Cosmin Dan

||

Source

Answer 1

Ok so the most likely reason for what you get is:

Firstly you do not use scaling for the data, try to use standard scaler.

scaler = StandardScaler()
scaler.fit(numere)
numere = scaler.transform(numere)
numereTest = scaler.transform(numereTest)

Secondly you are not tuning your parameters, you need to select the best fitting parameters, I strongly recommend using grid search. You can find an example how to use it here. Grid search is good for parameter tuning but take care to not use cross validation in this dataset, that is recommendation from its creators 🙂 Gamma and C can get to wide values from very low decimal numbers to very high numbers, you can’t test it properly manually.

Edit: you should not use CV so this is better way for you to do grid search

grid = { #edit ´this with more values
    'gamma': [0.001, 0.1, 10, 100, 1000, ],
    'C': [1, 10, 100]
}

for g in ParameterGrid(grid):
    clf.set_params(**g)
    clf.fit(numere, etichete)
    # save if best
    score = clf.score(numereTest, eticheteTest)
    if score > best_score:
        best_score = score
        best_grid = g

print ("best score:", best_score) 
print ("Grid:", best_grid)

Answered By: Ruli

SVC python output showing the same value of "1" for every C or gamma used

Question:

Answers: