Find the sum of squares for each cluster? in data using scale function and build the K-means
Question:
Consider the dataset “USArrests.csv”. This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.
Variables Description
- States:: The state where the incident occurred
- Murder: No. of arrests for murder (per 100,000 residents)
- Assault: No. of arrests for assault (per 100,000 residents)
- UrbanPop: Percentage of urban population
- Rape: Rape arrests (per 100,000 residents)
Set the column States
as index of the data frame while reading the data. Set the random number generator to set.seed(123)
. Normalize the data using scale function and build the K-means algorithm with the given conditions:
number of clusters = 4
nstart=20
According to the built model, the within cluster sum of squares for each cluster is __
(the order of values in each option could be different)
I am stuck after importing the data and setting the seed. Struggling to fit and build a K-means algorithm.
Answers:
I am happy in someway after looking at the dataset
. If I am not wrong this dataset is taken from Kaggle
Anyways using R
to execute the code here, hope you are familiar with the same. If not the concept would be very similar. Try to understand and re-write the code in your comfortable coding language.
After all the necessary formalities and import
data=read.csv("USArrests.csv", header=T, row.names = "States")
df <- scale(data)
set.seed(123)
fit<-kmeans(df, centers=4, nstart=20)
print(fit$withinss)
The output would be exactly 8.316061 11.952463 16.212213 19.922437
Feel free to comment if you don’t understand or find a mistake.
Consider the dataset “USArrests.csv”. This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.
Variables Description
- States:: The state where the incident occurred
- Murder: No. of arrests for murder (per 100,000 residents)
- Assault: No. of arrests for assault (per 100,000 residents)
- UrbanPop: Percentage of urban population
- Rape: Rape arrests (per 100,000 residents)
Set the column States
as index of the data frame while reading the data. Set the random number generator to set.seed(123)
. Normalize the data using scale function and build the K-means algorithm with the given conditions:
number of clusters = 4
nstart=20
According to the built model, the within cluster sum of squares for each cluster is __
(the order of values in each option could be different)
I am stuck after importing the data and setting the seed. Struggling to fit and build a K-means algorithm.
I am happy in someway after looking at the dataset
. If I am not wrong this dataset is taken from Kaggle
Anyways using R
to execute the code here, hope you are familiar with the same. If not the concept would be very similar. Try to understand and re-write the code in your comfortable coding language.
After all the necessary formalities and import
data=read.csv("USArrests.csv", header=T, row.names = "States")
df <- scale(data)
set.seed(123)
fit<-kmeans(df, centers=4, nstart=20)
print(fit$withinss)
The output would be exactly 8.316061 11.952463 16.212213 19.922437
Feel free to comment if you don’t understand or find a mistake.