How to input 3D data from dataframe for k-means clustering?

Question:

I have 505 sets of patient data (rows), each containing 17 sets (cols) of 3D [x,y,z] arrays.

In : data.iloc[0][0]
Out: array([ -23.47808471,   -9.92158009, 1447.74107884])

Snippet of df for clarity

Each set of patient data is a collection of 3D points marking centers of vertebrae, with 17 vertebrae marked per patient. I am attempting to use k-means clustering to classify how many different types of spines there are in the dataset, however, when trying to fit the model, I get errors such as "ValueError: setting an array element with a sequence." I am not quite sure on how to manipulate my dataframe so that each set of patient data is separate from one another.

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=4, n_init=10, max_iter=300)
kmeans.fit(data)

Thank you!

Plot of one row of data

Asked By: rchow

||

Answers:

kmeans.fit functions expects a 2-D array as input whereas in your case data is a 3-D array. One thing you can do is unravel the data points and turn them into individual features. Like this,

# Do this for all positions
data['Spine_L1_Center_x'] = data['Spine_L1_Center'].apply(lambda x: x[0])
data['Spine_L1_Center_y'] = data['Spine_L1_Center'].apply(lambda x: x[1])
data['Spine_L1_Center_z'] = data['Spine_L1_Center'].apply(lambda x: x[2])

data.drop(columns=['Spine_L1_Center', ... ], inplace=True)

And then try to fit that new data.

Answered By: Vishwas Chepuri
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.