running regression with clusters from k means

Question:

I did k means clustering by running below code

X_std = StandardScaler().fit_transform(df_logret)

km = Kmeans(n_clusters=2, max_iter = 100)
km.fit(X_std)
centroids = km.centroids

and I’d like to put cluster 1 in x_1 and cluster 2 in x_2 and run a regression that looks like y= ax_1+bx_2
I’ve been searching for ways to do this for the whole day but can’t find any.

the dataset ‘df_logret’ looks like
enter image description here

Any help would be greatly appreciated!

Asked By: mimiskims

||

Answers:

You’ve just applied Kmeans clustering on X_std. With the Sklearn package, you can extract the labels and fill them into the appropriate clusters.

Assuming your X_std is a 2×1 np array (i.e. np.array([[1,2],[3,4],[4,5]]...))

cluster_1 = []
cluster_2 = []

for i in range(len(X_std)):
    if km.labels_[i] == 0:
        cluster_1.append(X_std[i])
    else:
        cluster_2.append(X_std[i])

cluster_1_array = np.array(cluster_1)
cluster_2_array = np.array(cluster_2)
Answered By: ZWang