Value at KMeans.cluster_centers_ in sklearn KMeans

Question:

On doing K means fit on some vectors with 3 clusters, I was able to get the labels for the input data.
KMeans.cluster_centers_ returns the coordinates of the centers and so shouldn’t there be some vector corresponding to that? How can I find the value at the centroid of these clusters?

Asked By: Katherine

||

Answers:

The cluster centre value is the value of the centroid. At the end of k-means clustering, you’ll have three individual clusters and three centroids, with each centroid being located at the centre of each cluster. The centroid doesn’t necessarily have to coincide with an existing data point.

Answered By: bigsim

closest, _ = pairwise_distances_argmin_min(KMeans.cluster_centers_, X)

The array closest will contain the index of the point in X that is closest to each centroid.

Let’s say the closest gave output as array([0,8,5]) for the three clusters. So X[0] is the closest point in X to centroid 0, and X[8] is the closest to centroid 1 and so on.

Source: https://codedump.io/share/XiME3OAGY5Tm/1/get-nearest-point-to-centroid-scikit-learn

Answered By: Sharda Pratti

Sharda neglected to import the metrics module from scikit-learn, see below.

from sklearn.metrics import pairwise_distances_argmin_min
closest, _ = pairwise_distances_argmin_min(kmeans.cluster_centers_, X)

or

closest, _ = sklearn.metrics.pairwise_distances_argmin_min(kmeans.cluster_centers_, X)

Assuming X is the input data and kmeans has been fit to that data, both options give you an array, closest, for which each element is the index of the closest element in X to that centroid. Thus, closest[0] is the index of the data closest to the first centroid and X[closest[0]] is that data.

To answer your first question, k-means clustering randomly selects a point in the plane for each centroid and then adjusts them all to be the best representatives of the data. The centroids will not necessarily end up coinciding with any of the original data. This contrasts with the Affinity Propagation Clustering algorithm which picks an exemplar data point as the representative for each cluster, not just a point in the same plane.

Answered By: Spencer J Rothfuss
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.