SVC python output showing the same value of "1" for every C or gamma used

Question:

This is the code:

import numpy as np
from sklearn import svm
numere=np.fromfile("sat.trn",dtype=int,count=-1,sep=" ")
numereTest=np.fromfile("sat.tst",dtype=int,count=-1,sep=" ")
numere=numere.reshape(int(len(numere)/37),37)
numereTest=numereTest.reshape(int(len(numereTest)/37),37)
etichete=numere[0:int(len(numere)),36]
eticheteTest=numereTest[0:int(len(numereTest)),36]
numere=np.delete(numere,36,1)
numereTest=np.delete(numereTest,36,1)
clf=svm.SVC(kernel='rbf',C=1,gamma=1)
clf.fit(numere,etichete)
predictie=clf.predict(numereTest)

I took the data from a file that has it all and then I made 2 np.arrays with them, but the output is 1 everything I do.

numere(:10)–>array([[ 92, 115, 120, 94, 84, 102, 106, 79, 84, 102, 102, 83, 101,
126, 133, 103, 92, 112, 118, 85, 84, 103, 104, 81, 102, 126,
134, 104, 88, 121, 128, 100, 84, 107, 113, 87],
[ 84, 102, 106, 79, 84, 102, 102, 83, 80, 102, 102, 79, 92,
112, 118, 85, 84, 103, 104, 81, 84, 99, 104, 78, 88, 121,
128, 100, 84, 107, 113, 87, 84, 99, 104, 79],
[ 84, 102, 102, 83, 80, 102, 102, 79, 84, 94, 102, 79, 84,
103, 104, 81, 84, 99, 104, 78, 84, 99, 104, 81, 84, 107,
113, 87, 84, 99, 104, 79, 84, 99, 104, 79],
[ 80, 102, 102, 79, 84, 94, 102, 79, 80, 94, 98, 76, 84,
99, 104, 78, 84, 99, 104, 81, 76, 99, 104, 81, 84, 99,
104, 79, 84, 99, 104, 79, 84, 103, 104, 79],
[ 84, 94, 102, 79, 80, 94, 98, 76, 80, 102, 102, 79, 84,
99, 104, 81, 76, 99, 104, 81, 76, 99, 108, 85, 84, 99,
104, 79, 84, 103, 104, 79, 79, 107, 109, 87],
[ 80, 94, 98, 76, 80, 102, 102, 79, 76, 102, 102, 79, 76,
99, 104, 81, 76, 99, 108, 85, 76, 103, 118, 88, 84, 103,
104, 79, 79, 107, 109, 87, 79, 107, 109, 87],
[ 76, 102, 106, 83, 76, 102, 106, 87, 80, 98, 106, 79, 80,
107, 118, 88, 80, 112, 118, 88, 80, 107, 113, 85, 79, 107,
113, 87, 79, 103, 104, 83, 79, 103, 104, 79],
[ 76, 102, 106, 87, 80, 98, 106, 79, 76, 94, 102, 76, 80,
112, 118, 88, 80, 107, 113, 85, 80, 95, 100, 78, 79, 103,
104, 83, 79, 103, 104, 79, 79, 95, 100, 79],
[ 76, 89, 98, 76, 76, 94, 98, 76, 76, 98, 102, 72, 80,
95, 104, 74, 76, 91, 104, 74, 76, 95, 100, 78, 75, 91,
96, 75, 75, 91, 96, 71, 79, 87, 93, 71],
[ 76, 94, 98, 76, 76, 98, 102, 72, 76, 94, 90, 76, 76,
91, 104, 74, 76, 95, 100, 78, 76, 91, 100, 74, 75, 91,
96, 71, 79, 87, 93, 71, 79, 87, 93, 67]])

Answers:

Ok so the most likely reason for what you get is:

Firstly you do not use scaling for the data, try to use standard scaler.

scaler = StandardScaler()
scaler.fit(numere)
numere = scaler.transform(numere)
numereTest = scaler.transform(numereTest)

Secondly you are not tuning your parameters, you need to select the best fitting parameters, I strongly recommend using grid search. You can find an example how to use it here. Grid search is good for parameter tuning but take care to not use cross validation in this dataset, that is recommendation from its creators 🙂 Gamma and C can get to wide values from very low decimal numbers to very high numbers, you can’t test it properly manually.

Edit: you should not use CV so this is better way for you to do grid search

grid = { #edit ´this with more values
    'gamma': [0.001, 0.1, 10, 100, 1000, ],
    'C': [1, 10, 100]
}

for g in ParameterGrid(grid):
    clf.set_params(**g)
    clf.fit(numere, etichete)
    # save if best
    score = clf.score(numereTest, eticheteTest)
    if score > best_score:
        best_score = score
        best_grid = g

print ("best score:", best_score) 
print ("Grid:", best_grid)
Answered By: Ruli
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.