How to get SSE for each cluster in k means?

Question:

I am using the sklearn.cluster KMeans package and trying to get SSE for each cluster. I understand kmeans.inertia_ will give the sum of SSEs for all clusters. Is there any way to get SSE for each cluster in sklearn.cluster KMeans package?

I have a dataset which has 7 attributes and 210 observations. The number of cluster is 3 and would like to compute SSE for each cluster.

Asked By: macaroni

||

Answers:

There is no direct way to do this using a KMeans object. However, you can easily compute the sum of squared distances for each cluster yourself.

import numpy as np

# ...

kmeans = KMeans(n_clusters=3).fit(X)

cluster_centers = [X[kmeans.labels_ == i].mean(axis=0) for i in range(3)]

clusterwise_sse = [0, 0, 0]
for point, label in zip(X, kmeans.labels_):
    clusterwise_sse[label] += np.square(point - cluster_centers[label]).sum()

This snippet is not the most efficient way to do this since my goal was to present the concept clearly.

Answered By: rezso.dev

Basically there is a parameter for your ‘k means’ model which is called ‘inertia_’
This parameter calculates the sum of squared errors and you can basically save it like in an array or so and can plot it as shown in the code below.

k = range(1,10)
sum_squared_errors = []

for i in k:
  model = KMeans(n_clusters = i)
  model.fit_predict(X)
  sum_squared_errors.append(model.inertia_)

plt.plot(k,sum_squared_errors)
plt.xlabel('K-Value')
plt.ylabel('Sum of Squared Errors')
Answered By: Fahad Abdullah
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.