k-means | py4u

How to make KMeans Clustering more Meaningful for Titanic Data?

How to make KMeans Clustering more Meaningful for Titanic Data? Question: I’m running this code. import pandas as pd titanic = pd.read_csv(‘titanic.csv’) titanic.head() #Import required module from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.cluster import KMeans from sklearn.metrics import adjusted_rand_score documents = titanic[‘Name’] vectorizer = TfidfVectorizer(stop_words=’english’) X = vectorizer.fit_transform(documents) from sklearn.cluster import KMeans # initialize kmeans with …

Total answers: 1

Why does sklearn KMeans changes my dataset after fitting?

Why does sklearn KMeans changes my dataset after fitting? Question: I am using the KMeans from sklearn to cluster the College.csv. But when I fit the KMeans model, my dataset changes after that! Before using KMeans, I Standardize the numerical variables with StandardScaler and I use OneHotEncoder to dummy the categorical variable "Private". My code …

Total answers: 3

Python + Image Processing: Efficiently Assign Pixel Values to Nearest Predefined Value

Python + Image Processing: Efficiently Assign Pixel Values to Nearest Predefined Value Question: I implemented an algorithm that uses opencv kmeans to quantize the unique brightness values present in a greyscale image. Quantizing the unique values helped avoid biases towards image backgrounds which are typically all the same value. However, I struggled to find a …

Total answers: 2

K-Means Clustering Output seems wrong, how can it be explained?

K-Means Clustering Output seems wrong, how can it be explained? Question: I have a custom dataset that I want to partion using kmeans. This is my MCVE: from sklearn.cluster import KMeans import matplotlib.pyplot as plt samples = np.array([[3.2736001e+03, 1.7453293e+00], [3.7256001e+03, 5.2359879e-02], [3.2960000e+03, 1.7366025e+00], [3.7112000e+03, 4.3633230e-02], [3.7136001e+03, 4.3633230e-02], [6.8240002e+02, 1.4137167e+00], [6.9279999e+02, 1.4049901e+00], [3.2944001e+03, 1.7366025e+00], [3.7480000e+03, 6.1086524e-02], …

Total answers: 1

How to input 3D data from dataframe for k-means clustering?

How to input 3D data from dataframe for k-means clustering? Question: I have 505 sets of patient data (rows), each containing 17 sets (cols) of 3D [x,y,z] arrays. In : data.iloc[0][0] Out: array([ -23.47808471, -9.92158009, 1447.74107884]) Snippet of df for clarity Each set of patient data is a collection of 3D points marking centers of …

Total answers: 1

Am I interpreting K-means results correctly?

Am I interpreting K-means results correctly? Question: I have implemented k-means elbow plot to find the optimum K for my data (after doing PCA). I have gotten the elbow plot shown below. My question is: I think the optimum K is 3 in my case (this is where a sudden drop occurs/point of inflection)? But …

Total answers: 1

How to dictionary elements of list in a dictionary?

How to dictionary elements of list in a dictionary? Question: I have printed output following data: dd={2: [314, 334, 298, 316, 336, 325, 337, 344, 319, 323], 1: [749, 843, 831, 795, 769]} I tried to cluster list elements of each keys(including 2 & 1) into 2 clusters using kmeans. Here is my code: from …

Total answers: 1

Learning: KMeans clustering inconsistent results

Learning: KMeans clustering inconsistent results Question: Learning ML and I’m new to KMeans clustering. How do I know if my model is accurate with the consistently inconsistent results that I’m getting? What I mean by consistently inconsistent is I get the exact same set of 4 results but they appear randomly. Setup (Jupyter Notebook): I’m …

Total answers: 1

Using k-Means Clustering to try to identify a 2D outlier shows no outliers at all (instead of one)

Using k-Means Clustering to try to identify a 2D outlier shows no outliers at all (instead of one) Question: I was working my way through An Introduction to Outlier Analysis by Charu Aggarwal and doing Exercise 7 from Chapter 1. I am trying to use k-Means Clustering to identify an outlier in my data. What …

Total answers: 4

How to KMeans Cluster strings

How to KMeans Cluster strings Question: I’m data engineer, who has limited understanding of ML methods and am trying to get a good strategy that i understand before I start coding. What i’m trying to do is create clusters out of key value pairs with the key being a name, and the value being some …

Total answers: 2