OverflowError: cannot convert float infinity to integer, after doing so

Question:

import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import numpy as np
import quantstats as qs

data = pd.read_csv('worldometer_data.csv')
X = data.drop(columns=['Country/Region', 'Continent', 'Population', 'WHO Region'])

# replace NaN values with 0
for i in X:
  X[i] = X[i].fillna(0)
  
# getting rid of float infinity
X = X.replace([np.inf, -np.inf, -0], 0)

wcss = []

# getting Kmeans
for i in range(0, 51):
  kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
  kmeans.fit(X)
  wcss.append(kmeans.inertia_)

# visualizing the kmeans graph
plt.plot(range(0, 51), wcss)
plt.title('Elbow method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

I have checked the X array after getting rid of float infinity and it does that successfully . But when it gets to kmeans.fit(X) it fails and returns the OverflowError.

Error:

OverflowError                             Traceback (most recent call last)
~AppDataLocalTempipykernel_214163473011824.py in <module>
     16 for i in range(0, 51):
     17   kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
---> 18   kmeans.fit(X)
     19   wcss.append(kmeans.inertia_)
     20 

c:Usersusranaconda3libsite-packagessklearncluster_kmeans.py in fit(self, X, y, sample_weight)
   1177         for i in range(self._n_init):
   1178             # Initialize centers
-> 1179             centers_init = self._init_centroids(
   1180                 X, x_squared_norms=x_squared_norms, init=init, random_state=random_state
   1181             )

c:Usersusranaconda3libsite-packagessklearncluster_kmeans.py in _init_centroids(self, X, x_squared_norms, init, random_state, init_size)
   1088 
   1089         if isinstance(init, str) and init == "k-means++":
-> 1090             centers, _ = _kmeans_plusplus(
   1091                 X,
   1092                 n_clusters,

c:Usersusranaconda3libsite-packagessklearncluster_kmeans.py in _kmeans_plusplus(X, n_clusters, x_squared_norms, random_state, n_local_trials)
    189         # specific results for other than mentioning in the conclusion
...
--> 191         n_local_trials = 2 + int(np.log(n_clusters))
    192 
    193     # Pick first center randomly and track index of point

OverflowError: cannot convert float infinity to integer

How can I fix this and is there something else I did wrong?

Dataset used: https://www.kaggle.com/datasets/imdevskp/corona-virus-report (the worldometer_data.csv)

Asked By: Code7G

||

Answers:

The problem is here:

for i in range(0, 51):
  kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)

You cannot set n_clusers to 0. It must be larger than 0.

Answered By: Chrispresso
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.