running regression with clusters from k means
Question:
I did k means clustering by running below code
X_std = StandardScaler().fit_transform(df_logret)
km = Kmeans(n_clusters=2, max_iter = 100)
km.fit(X_std)
centroids = km.centroids
and I’d like to put cluster 1 in x_1 and cluster 2 in x_2 and run a regression that looks like y= ax_1+bx_2
I’ve been searching for ways to do this for the whole day but can’t find any.
the dataset ‘df_logret’ looks like
Any help would be greatly appreciated!
Answers:
You’ve just applied Kmeans clustering on X_std
. With the Sklearn package, you can extract the labels and fill them into the appropriate clusters.
Assuming your X_std is a 2×1 np array (i.e. np.array([[1,2],[3,4],[4,5]]...
))
cluster_1 = []
cluster_2 = []
for i in range(len(X_std)):
if km.labels_[i] == 0:
cluster_1.append(X_std[i])
else:
cluster_2.append(X_std[i])
cluster_1_array = np.array(cluster_1)
cluster_2_array = np.array(cluster_2)
I did k means clustering by running below code
X_std = StandardScaler().fit_transform(df_logret)
km = Kmeans(n_clusters=2, max_iter = 100)
km.fit(X_std)
centroids = km.centroids
and I’d like to put cluster 1 in x_1 and cluster 2 in x_2 and run a regression that looks like y= ax_1+bx_2
I’ve been searching for ways to do this for the whole day but can’t find any.
the dataset ‘df_logret’ looks like
Any help would be greatly appreciated!
You’ve just applied Kmeans clustering on X_std
. With the Sklearn package, you can extract the labels and fill them into the appropriate clusters.
Assuming your X_std is a 2×1 np array (i.e. np.array([[1,2],[3,4],[4,5]]...
))
cluster_1 = []
cluster_2 = []
for i in range(len(X_std)):
if km.labels_[i] == 0:
cluster_1.append(X_std[i])
else:
cluster_2.append(X_std[i])
cluster_1_array = np.array(cluster_1)
cluster_2_array = np.array(cluster_2)