How to Visualize Kmeans Clusters with respect to users id
Question:
I have a dataset containing almost 28K users and almost 7K features
Here’s how the dataframe looks like
I have applied K-Means Clustering and here’s the code I have done
scaler = MinMaxScaler()
data_rescaled = scaler.fit_transform(df3)
scaled_df = pd.DataFrame(data_rescaled, index=df3.index, columns=df3.columns)
from sklearn.decomposition import PCA
pca = PCA(n_components = 3)
pca.fit(scaled_df)
reduced = pca.transform(scaled_df)
kmeanModel = KMeans(n_clusters=100 , random_state = 0)
label = kmeanModel.fit_predict(reduced)
sse = kmeanModel.inertia_
How do I visualize the Clusters vs Users Histogram plot? as X-axis being Clusters and Y-Axis being user id in order to see how many users lie in each cluster
Answers:
use the matplotlib
import matplotlib.pyplot as plt
# Create a new column in the dataframe with the cluster labels
scaled_df['cluster'] = label
# Group the dataframe by the cluster column and count the number of users in each cluster
cluster_counts = scaled_df.groupby('cluster').count()['user_id']
# Plot the histogram
plt.bar(cluster_counts.index, cluster_counts.values)
plt.xlabel('Clusters')
plt.ylabel('Number of users')
plt.show()
use seaborn library
import seaborn as sns
sns.countplot(data=scaled_df,x='cluster')
plt.xlabel('Clusters')
plt.ylabel('Number of users')
plt.show()
I have a dataset containing almost 28K users and almost 7K features
Here’s how the dataframe looks like
I have applied K-Means Clustering and here’s the code I have done
scaler = MinMaxScaler()
data_rescaled = scaler.fit_transform(df3)
scaled_df = pd.DataFrame(data_rescaled, index=df3.index, columns=df3.columns)
from sklearn.decomposition import PCA
pca = PCA(n_components = 3)
pca.fit(scaled_df)
reduced = pca.transform(scaled_df)
kmeanModel = KMeans(n_clusters=100 , random_state = 0)
label = kmeanModel.fit_predict(reduced)
sse = kmeanModel.inertia_
How do I visualize the Clusters vs Users Histogram plot? as X-axis being Clusters and Y-Axis being user id in order to see how many users lie in each cluster
use the matplotlib
import matplotlib.pyplot as plt
# Create a new column in the dataframe with the cluster labels
scaled_df['cluster'] = label
# Group the dataframe by the cluster column and count the number of users in each cluster
cluster_counts = scaled_df.groupby('cluster').count()['user_id']
# Plot the histogram
plt.bar(cluster_counts.index, cluster_counts.values)
plt.xlabel('Clusters')
plt.ylabel('Number of users')
plt.show()
use seaborn library
import seaborn as sns
sns.countplot(data=scaled_df,x='cluster')
plt.xlabel('Clusters')
plt.ylabel('Number of users')
plt.show()