Find the sum of squares for each cluster? in data using scale function and build the K-means

Question

Consider the dataset “USArrests.csv”. This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.

Variables Description

States:: The state where the incident occurred
Murder: No. of arrests for murder (per 100,000 residents)
Assault: No. of arrests for assault (per 100,000 residents)
UrbanPop: Percentage of urban population
Rape: Rape arrests (per 100,000 residents)

Set the column States as index of the data frame while reading the data. Set the random number generator to set.seed(123). Normalize the data using scale function and build the K-means algorithm with the given conditions:

number of clusters = 4
nstart=20

According to the built model, the within cluster sum of squares for each cluster is __
(the order of values in each option could be different)

I am stuck after importing the data and setting the seed. Struggling to fit and build a K-means algorithm.

Asked By: wildDog

||

Source

Answer 1

I am happy in someway after looking at the dataset. If I am not wrong this dataset is taken from Kaggle

Anyways using R to execute the code here, hope you are familiar with the same. If not the concept would be very similar. Try to understand and re-write the code in your comfortable coding language.

After all the necessary formalities and import

data=read.csv("USArrests.csv", header=T, row.names = "States")
df <- scale(data)
set.seed(123)
fit<-kmeans(df, centers=4, nstart=20)
print(fit$withinss)

The output would be exactly 8.316061 11.952463 16.212213 19.922437

Feel free to comment if you don’t understand or find a mistake.

Answered By: 01001010

Find the sum of squares for each cluster? in data using scale function and build the K-means

Question:

Answers: