Getting a keyError: 0 with the linear classifier code

Question:

I am trying the make a linear classifier code without using the APIs to understand the fundamentals. Below is the code:

    import numpy as np 
    import matplotlib.pyplot as plt 
    import pandas as pd 
    from sklearn.model_selection import train_test_split
    
    data = pd.read_csv('files/weather.csv', parse_dates= True, index_col=0)
    data.head()
    
    data.isnull().sum()
    
    dataset = data[['Humidity3pm','Pressure3pm','RainTomorrow']].dropna()
    
    X = dataset[['Humidity3pm', 'Pressure3pm']]
    y = dataset['RainTomorrow']
    y = np.array([0 if value == 'No' else 1 for value in y])
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
    
    
    def linear_classifier(X, y, learning_rate=0.01, num_epochs=100):
        num_features = X.shape[1] 
        weights = np.zeros(num_features)
        bias = 0
        
        for epoch in range(num_epochs):
            for i in range(X.shape[0]):
                linear_output = np.dot(X[i], weights) + bias
               
                y_pred = np.sign(linear_output)
                
               
                error = y[i] - y_pred
                # print("The value of error=", error)
                weights = weights + learning_rate * error * X[i]
    
                bias += learning_rate * error
                
        return weights, bias
    
    
    weights, bias = linear_classifier(X_train, y_train)   ## This lines gives the error

I am quite new to python and getting the error:

    Output exceeds the size limit. Open the full output data in a text editor
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    File c:Usersmanucanaconda3envslearning_pythonlibsite-packagespandascoreindexesbase.py:3802, in Index.get_loc(self, key, method, tolerance)
       3801 try:
    -> 3802     return self._engine.get_loc(casted_key)
       3803 except KeyError as err:
    
            
    KeyError: 0
    

I would be great help if I can helped in resolving this error. I am new to python and machine learning. Thank you in adavance.

Edit
Edited the question to provide complete error.

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File c:Usersmanucanaconda3envslearning_pythonlibsite-packagespandascoreindexesbase.py:3802, in Index.get_loc(self, key, method, tolerance)
   3801 try:
-> 3802     return self._engine.get_loc(casted_key)
   3803 except KeyError as err:

File c:Usersmanucanaconda3envslearning_pythonlibsite-packagespandas_libsindex.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

File c:Usersmanucanaconda3envslearning_pythonlibsite-packagespandas_libsindex.pyx:165, in pandas._libs.index.IndexEngine.get_loc()

File pandas_libshashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas_libshashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[170], line 1
----> 1 weights, bias = linear_classifier(X_train, y_train) 

Cell In[169], line 10, in linear_classifier(X, y, learning_rate, num_epochs)
      8 for epoch in range(num_epochs):
...
   3807     #  InvalidIndexError. Otherwise we fall through and re-raise
   3808     #  the TypeError.
   3809     self._check_indexing_error(key)

KeyError: 0

The link from where I have taken the weather data : https://github.com/LearnPythonWithRune/MachineLearningWithPython/blob/main/jupyter/final/files/weather.csv

Asked By: Deepika

||

Answers:

The full error log includes the following lines

Traceback (most recent call last):
  File "linear_classifier.py", line 40, in <module>
    weights, bias = linear_classifier(X_train, y_train)
  File "linear_classifier.py", line 27, in linear_classifier
    linear_output = np.dot(X[i], weights) + bias

As you can see, the error is in indexing X[i]. You will notice that X_train is a Dataframe with a date-based index and not an integer index.

This can be solved by changing line 13 of your code from X = dataset[['Humidity3pm', 'Pressure3pm']] to

...
X = dataset[['Humidity3pm', 'Pressure3pm']].values
...

Adding the values at the end returns only the values as a numpy.ndarray [Link]

Answered By: Shahan M
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.