hog+svm classification takes too mcuh time

Question:

i have directory called Animal- which consist of two subdirectory(cats and dogs), create a full path for each file does not take too much time and it returns quickly, here is sample code :

import  cv2
import numpy as np
from skimage.feature import hog
from sklearn.svm import SVC
from sklearn.decomposition import PCA
import os
mypca =PCA(n_components=10)
parent ="Animal"
file_list =[]
features =[]
labels =[]
subdirectory =os.listdir(parent)
for  animal in subdirectory:
    full_directory =os.path.join(parent,animal)
    for  file in os.listdir(full_directory):
        file_path =os.path.join(full_directory,file)
        file_list.append(file_path)
print(file_list)

i am saving full path for each file in order to avoid nested loop, then i should iterate through list and read each image one-by-one and apply hog feature extraction algorithm, here is this code :

for file in file_list:
    image =cv2.imread(file)
    image = cv2.resize(image, (128 * 4, 64 * 4))
    fd, hog_image = hog(image, orientations=9, pixels_per_cell=(8, 8),
                            cells_per_block=(4, 4), visualize=True, multichannel=True)
    features.append(fd)
    if file.find("cats") !=-1:
        labels.append(0)
    else:
        labels.append(1)
labels =np.array(labels)
features =np.array(features)
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test =train_test_split(features,labels,test_size=0.2,random_state=1)
X_train =mypca.fit_transform(X_train)
X_test =mypca.transform(X_test)
mymodel =SVC()
mymodel.fit(X_train,y_train)
print(mymodel.score(X_test,y_test))
print(features.shape)

but this code takes too much time(even my computer HP Omen has 6 cpu qore), could you please advice me how to accelerate execution time? originally in directory there was 4000 image in each category and i have deleted more then 2000, but still waiting for long time untill it finished operation, i have tried as well to apply PCA in order to reduce dimensionality.what would be your recomendation?

Asked By: neural science

||

Answers:

You are using the sklearn.svm.SVC method that is inappropriate for large datasets, as it becomes too inefficient as it scales quadratically or more, from the documentation:

The implementation is based on libsvm. The fit time scales at least
quadratically with the number of samples and may be impractical beyond
tens of thousands of samples. For large datasets consider using
LinearSVC or SGDClassifier instead, possibly after a Nystroem
transformer or other Kernel Approximation.

Answered By: Caridorc
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.