KeyError: 0 from Python

Question:

I am trying to follow chapter 3 of Hands-On Machine Learning with Scikit-Learn and TensorFlow for classification of MNIST data. The command runs as follows in Jupyter notebook:

>>> from sklearn.datasets import fetch_openml
>>> mnist = fetch_openml('mnist_784', version=1)
>>> mnist.keys()
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'details',
'categories', 'url'])

>>> X, y = mnist["data"], mnist["target"]
>>> X.shape
(70000, 784)
>>> y.shape
(70000,)

The following command throws error

>>> some_digit = X[0]

Error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

~/anaconda3/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

~/anaconda3/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-43-348a6e96ae02> in <module>
----> 1 some_digit = X[0]

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3456             if self.columns.nlevels > 1:
   3457                 return self._getitem_multilevel(key)
-> 3458             indexer = self.columns.get_loc(key)
   3459             if is_integer(indexer):
   3460                 indexer = [indexer]

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 0

It is hard for me understand what the actual error is as I have not come across similar one for such a simple assignment. What is causing the issue?

Asked By: SKPS

||

Answers:

X is a dataframe so if you use X[0], it means you are looking for a column named "0". If you want the first row (index) of your dataframe, you have to use .loc or .iloc. In your case both methods are equivalent (only) because the index is numeric, start from 0 and continuous:

# Extract the first row as a Series
>>> X.loc[0]
pixel1      0.0
pixel2      0.0
pixel3      0.0
pixel4      0.0
pixel5      0.0
           ... 
pixel780    0.0
pixel781    0.0
pixel782    0.0
pixel783    0.0
pixel784    0.0
Name: 0, Length: 784, dtype: float64

# Extract a pixel by label
>>> X.loc[0, 'pixel7']
0.0

# Extract the same pixel by position
>>> X.iloc[0, 6]
0.0

Update

Probably iloc would be more appropriate here

If you want to use iloc, prefer use numpy instead of pandas and convert data and target columns as array:

X, y = mnist["data"].to_numpy(), mnist["target"].to_numpy()
Answered By: Corralien
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.