Am I interpreting K-means results correctly?

Question:

I have implemented k-means elbow plot to find the optimum K for my data (after doing PCA). I have gotten the elbow plot shown below. My question is: I think the optimum K is 3 in my case (this is where a sudden drop occurs/point of inflection)? But looking at my X_PCA_1 VS. X_PCA_2 plot, I think the data can be clustered into 2 clusters only? or am I mistaken?

Note: I am still a beginner.

K-elbow

X_PCA_1 VS. X_PCA_2

Asked By: Z47

||

Answers:

If you want to plot to see clearly the clusters, first you can use PCA with 3 components:

pca = PCA(3)
X_pca = pca.fit_transform(scaled_df)

Then, you can append each point to each dimension:

X = []
Y = []
Z = []
for i in X_pca:
    X.append(i[0])
    Y.append(i[1])
    Z.append(i[2])

From here you can choose a library to plot 3d graphs.

model = KMeans(n_clusters=3)
cluster_kmeans = model.fit_predict(scaled_df)
df_graph = pd.DataFrame({'X': X,
                         'Y': Y,
                         'Z': Z,
                         'labels': cluster_kmeans
                         })

fig = plt.figure(figsize=(20, 10))
ax = fig.add_subplot(111, projection='3d')

for s in df_graph.labels.unique():
    ax.scatter(df_graph.X[df_graph.labels==s],df_graph.Y[df_graph.labels==s],df_graph.Z[df_graph.labels==s],label=s)
    
ax.legend()
plt.show()
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.