Value at KMeans.cluster_centers_ in sklearn KMeans
Question:
On doing K means fit on some vectors with 3 clusters, I was able to get the labels for the input data.
KMeans.cluster_centers_
returns the coordinates of the centers and so shouldn’t there be some vector corresponding to that? How can I find the value at the centroid of these clusters?
Answers:
The cluster centre value is the value of the centroid. At the end of k-means clustering, you’ll have three individual clusters and three centroids, with each centroid being located at the centre of each cluster. The centroid doesn’t necessarily have to coincide with an existing data point.
closest, _ = pairwise_distances_argmin_min(KMeans.cluster_centers_, X)
The array closest
will contain the index of the point in X that is closest to each centroid.
Let’s say the closest
gave output as array([0,8,5])
for the three clusters. So X[0] is the closest point in X to centroid 0, and X[8] is the closest to centroid 1 and so on.
Source: https://codedump.io/share/XiME3OAGY5Tm/1/get-nearest-point-to-centroid-scikit-learn
Sharda neglected to import the metrics
module from scikit-learn, see below.
from sklearn.metrics import pairwise_distances_argmin_min
closest, _ = pairwise_distances_argmin_min(kmeans.cluster_centers_, X)
or
closest, _ = sklearn.metrics.pairwise_distances_argmin_min(kmeans.cluster_centers_, X)
Assuming X
is the input data and kmeans
has been fit to that data, both options give you an array, closest
, for which each element is the index of the closest element in X
to that centroid. Thus, closest[0]
is the index of the data closest to the first centroid and X[closest[0]]
is that data.
To answer your first question, k-means clustering randomly selects a point in the plane for each centroid and then adjusts them all to be the best representatives of the data. The centroids will not necessarily end up coinciding with any of the original data. This contrasts with the Affinity Propagation Clustering algorithm which picks an exemplar data point as the representative for each cluster, not just a point in the same plane.
On doing K means fit on some vectors with 3 clusters, I was able to get the labels for the input data.
KMeans.cluster_centers_
returns the coordinates of the centers and so shouldn’t there be some vector corresponding to that? How can I find the value at the centroid of these clusters?
The cluster centre value is the value of the centroid. At the end of k-means clustering, you’ll have three individual clusters and three centroids, with each centroid being located at the centre of each cluster. The centroid doesn’t necessarily have to coincide with an existing data point.
closest, _ = pairwise_distances_argmin_min(KMeans.cluster_centers_, X)
The array closest
will contain the index of the point in X that is closest to each centroid.
Let’s say the closest
gave output as array([0,8,5])
for the three clusters. So X[0] is the closest point in X to centroid 0, and X[8] is the closest to centroid 1 and so on.
Source: https://codedump.io/share/XiME3OAGY5Tm/1/get-nearest-point-to-centroid-scikit-learn
Sharda neglected to import the metrics
module from scikit-learn, see below.
from sklearn.metrics import pairwise_distances_argmin_min
closest, _ = pairwise_distances_argmin_min(kmeans.cluster_centers_, X)
or
closest, _ = sklearn.metrics.pairwise_distances_argmin_min(kmeans.cluster_centers_, X)
Assuming X
is the input data and kmeans
has been fit to that data, both options give you an array, closest
, for which each element is the index of the closest element in X
to that centroid. Thus, closest[0]
is the index of the data closest to the first centroid and X[closest[0]]
is that data.
To answer your first question, k-means clustering randomly selects a point in the plane for each centroid and then adjusts them all to be the best representatives of the data. The centroids will not necessarily end up coinciding with any of the original data. This contrasts with the Affinity Propagation Clustering algorithm which picks an exemplar data point as the representative for each cluster, not just a point in the same plane.