Getting a keyError: 0 with the linear classifier code
Question:
I am trying the make a linear classifier code without using the APIs to understand the fundamentals. Below is the code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
data = pd.read_csv('files/weather.csv', parse_dates= True, index_col=0)
data.head()
data.isnull().sum()
dataset = data[['Humidity3pm','Pressure3pm','RainTomorrow']].dropna()
X = dataset[['Humidity3pm', 'Pressure3pm']]
y = dataset['RainTomorrow']
y = np.array([0 if value == 'No' else 1 for value in y])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
def linear_classifier(X, y, learning_rate=0.01, num_epochs=100):
num_features = X.shape[1]
weights = np.zeros(num_features)
bias = 0
for epoch in range(num_epochs):
for i in range(X.shape[0]):
linear_output = np.dot(X[i], weights) + bias
y_pred = np.sign(linear_output)
error = y[i] - y_pred
# print("The value of error=", error)
weights = weights + learning_rate * error * X[i]
bias += learning_rate * error
return weights, bias
weights, bias = linear_classifier(X_train, y_train) ## This lines gives the error
I am quite new to python and getting the error:
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File c:Usersmanucanaconda3envslearning_pythonlibsite-packagespandascoreindexesbase.py:3802, in Index.get_loc(self, key, method, tolerance)
3801 try:
-> 3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
KeyError: 0
I would be great help if I can helped in resolving this error. I am new to python and machine learning. Thank you in adavance.
Edit
Edited the question to provide complete error.
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File c:Usersmanucanaconda3envslearning_pythonlibsite-packagespandascoreindexesbase.py:3802, in Index.get_loc(self, key, method, tolerance)
3801 try:
-> 3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
File c:Usersmanucanaconda3envslearning_pythonlibsite-packagespandas_libsindex.pyx:138, in pandas._libs.index.IndexEngine.get_loc()
File c:Usersmanucanaconda3envslearning_pythonlibsite-packagespandas_libsindex.pyx:165, in pandas._libs.index.IndexEngine.get_loc()
File pandas_libshashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas_libshashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[170], line 1
----> 1 weights, bias = linear_classifier(X_train, y_train)
Cell In[169], line 10, in linear_classifier(X, y, learning_rate, num_epochs)
8 for epoch in range(num_epochs):
...
3807 # InvalidIndexError. Otherwise we fall through and re-raise
3808 # the TypeError.
3809 self._check_indexing_error(key)
KeyError: 0
The link from where I have taken the weather data : https://github.com/LearnPythonWithRune/MachineLearningWithPython/blob/main/jupyter/final/files/weather.csv
Answers:
The full error log includes the following lines
Traceback (most recent call last):
File "linear_classifier.py", line 40, in <module>
weights, bias = linear_classifier(X_train, y_train)
File "linear_classifier.py", line 27, in linear_classifier
linear_output = np.dot(X[i], weights) + bias
As you can see, the error is in indexing X[i]
. You will notice that X_train
is a Dataframe with a date-based index and not an integer index.
This can be solved by changing line 13 of your code from X = dataset[['Humidity3pm', 'Pressure3pm']]
to
...
X = dataset[['Humidity3pm', 'Pressure3pm']].values
...
Adding the values at the end returns only the values as a numpy.ndarray
[Link]
I am trying the make a linear classifier code without using the APIs to understand the fundamentals. Below is the code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
data = pd.read_csv('files/weather.csv', parse_dates= True, index_col=0)
data.head()
data.isnull().sum()
dataset = data[['Humidity3pm','Pressure3pm','RainTomorrow']].dropna()
X = dataset[['Humidity3pm', 'Pressure3pm']]
y = dataset['RainTomorrow']
y = np.array([0 if value == 'No' else 1 for value in y])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
def linear_classifier(X, y, learning_rate=0.01, num_epochs=100):
num_features = X.shape[1]
weights = np.zeros(num_features)
bias = 0
for epoch in range(num_epochs):
for i in range(X.shape[0]):
linear_output = np.dot(X[i], weights) + bias
y_pred = np.sign(linear_output)
error = y[i] - y_pred
# print("The value of error=", error)
weights = weights + learning_rate * error * X[i]
bias += learning_rate * error
return weights, bias
weights, bias = linear_classifier(X_train, y_train) ## This lines gives the error
I am quite new to python and getting the error:
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File c:Usersmanucanaconda3envslearning_pythonlibsite-packagespandascoreindexesbase.py:3802, in Index.get_loc(self, key, method, tolerance)
3801 try:
-> 3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
KeyError: 0
I would be great help if I can helped in resolving this error. I am new to python and machine learning. Thank you in adavance.
Edit
Edited the question to provide complete error.
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File c:Usersmanucanaconda3envslearning_pythonlibsite-packagespandascoreindexesbase.py:3802, in Index.get_loc(self, key, method, tolerance)
3801 try:
-> 3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
File c:Usersmanucanaconda3envslearning_pythonlibsite-packagespandas_libsindex.pyx:138, in pandas._libs.index.IndexEngine.get_loc()
File c:Usersmanucanaconda3envslearning_pythonlibsite-packagespandas_libsindex.pyx:165, in pandas._libs.index.IndexEngine.get_loc()
File pandas_libshashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas_libshashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[170], line 1
----> 1 weights, bias = linear_classifier(X_train, y_train)
Cell In[169], line 10, in linear_classifier(X, y, learning_rate, num_epochs)
8 for epoch in range(num_epochs):
...
3807 # InvalidIndexError. Otherwise we fall through and re-raise
3808 # the TypeError.
3809 self._check_indexing_error(key)
KeyError: 0
The link from where I have taken the weather data : https://github.com/LearnPythonWithRune/MachineLearningWithPython/blob/main/jupyter/final/files/weather.csv
The full error log includes the following lines
Traceback (most recent call last):
File "linear_classifier.py", line 40, in <module>
weights, bias = linear_classifier(X_train, y_train)
File "linear_classifier.py", line 27, in linear_classifier
linear_output = np.dot(X[i], weights) + bias
As you can see, the error is in indexing X[i]
. You will notice that X_train
is a Dataframe with a date-based index and not an integer index.
This can be solved by changing line 13 of your code from X = dataset[['Humidity3pm', 'Pressure3pm']]
to
...
X = dataset[['Humidity3pm', 'Pressure3pm']].values
...
Adding the values at the end returns only the values as a numpy.ndarray
[Link]