cluster-analysis

Rand Index function (clustering performance evaluation)

Rand Index function (clustering performance evaluation) Question: As far as I know, there is no package available for Rand Index in python while for Adjusted Rand Index you have the option of using sklearn.metrics.adjusted_rand_score(labels_true, labels_pred). I wrote the code for Rand Score and I am going to share it with others as the answer to …

Total answers: 3

Is my python implementation of the Davies-Bouldin Index correct?

Is my python implementation of the Davies-Bouldin Index correct? Question: I’m trying to calculate the Davies-Bouldin Index in Python. Here are the steps the code below tries to reproduce. 5 Steps: For each cluster, compute euclidean distances between each point to the centroid For each cluster, compute the average of these distances For each pair …

Total answers: 5

Spectral Clustering a graph in python

Spectral Clustering a graph in python Question: I’d like to cluster a graph in python using spectral clustering. Spectral clustering is a more general technique which can be applied not only to graphs, but also images, or any sort of data, however, it’s considered an exceptional graph clustering technique. Sadly, I can’t find examples of …

Total answers: 3

python scikit-learn clustering with missing data

python scikit-learn clustering with missing data Question: I want to cluster data with missing columns. Doing it manually I would calculate the distance in case of a missing column simply without this column. With scikit-learn, missing data is not possible. There is also no chance to specify a user distance function. Is there any chance …

Total answers: 2

DBSCAN for clustering of geographic location data

DBSCAN for clustering of geographic location data Question: I have a dataframe with latitude and longitude pairs. Here is my dataframe look like. order_lat order_long 0 19.111841 72.910729 1 19.111342 72.908387 2 19.111342 72.908387 3 19.137815 72.914085 4 19.119677 72.905081 5 19.119677 72.905081 6 19.119677 72.905081 7 19.120217 72.907121 8 19.120217 72.907121 9 19.119677 72.905081 …

Total answers: 5

Will pandas dataframe object work with sklearn kmeans clustering?

Will pandas dataframe object work with sklearn kmeans clustering? Question: dataset is pandas dataframe. This is sklearn.cluster.KMeans km = KMeans(n_clusters = n_Clusters) km.fit(dataset) prediction = km.predict(dataset) This is how I decide which entity belongs to which cluster: for i in range(len(prediction)): cluster_fit_dict[dataset.index[i]] = prediction[i] This is how dataset looks: A 1 2 3 4 5 …

Total answers: 2

sklearn agglomerative clustering linkage matrix

sklearn agglomerative clustering linkage matrix Question: I’m trying to draw a complete-link scipy.cluster.hierarchy.dendrogram, and I found that scipy.cluster.hierarchy.linkage is slower than sklearn.AgglomerativeClustering. However, sklearn.AgglomerativeClustering doesn’t return the distance between clusters and the number of original observations, which scipy.cluster.hierarchy.dendrogram needs. Is there a way to take them? Asked By: Presian Abarov || Source Answers: It’s possible, …

Total answers: 5

Algorithm to decide cut-off for collapsing this tree?

Algorithm to decide cut-off for collapsing this tree? Question: I have a Newick tree that is built by comparing similarity (euclidean distance) of Position Weight Matrices (PWMs or PSSMs) of putative DNA regulatory motifs that are 4-9 bp long DNA sequences. An interactive version of the tree is up on iTol (here), which you can …

Total answers: 2

Scikit Learn – K-Means – Elbow – criterion

Scikit Learn – K-Means – Elbow – criterion Question: Today i’m trying to learn something about K-means. I Have understand the algorithm and i know how it works. Now i’m looking for the right k… I found the elbow criterion as a method to detect the right k but i do not understand how to …

Total answers: 3

scikit-learn DBSCAN memory usage

scikit-learn DBSCAN memory usage Question: UPDATED: In the end, the solution I opted to use for clustering my large dataset was one suggested by Anony-Mousse below. That is, using ELKI’s DBSCAN implimentation to do my clustering rather than scikit-learn’s. It can be run from the command line and with proper indexing, performs this task within …

Total answers: 5