Key error when accessing pandas dataframe

Question:

I get an error when trying to access a single element in a pandas dataframe this way test_df["LABEL"][0]. Here is a code snippet on how I am loading the data:

print "reading test set"
test_set = pd.read_csv(data_path+"small_test_products.txt", header=0, delimiter="|")

print "shape of the test set", test_set.shape 
test_df = pd.DataFrame(test_set)
lengthOfTestSet = len(test_df["LABEL"])
print test_df["LABEL"][0]

Here is the error I am getting:

  File "code.py", line 80, in <module>
    print test_df["LABEL"][0]
   File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 521, in __getitem__
    result = self.index.get_value(self, key)
   File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 3562, in get_value
    loc = self.get_loc(k)
   File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 3619, in get_loc
    return super(Float64Index, self).get_loc(key, method=method)
   File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 1572, in get_loc
    return self._engine.get_loc(_values_from_object(key))
   File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
   File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
   File "pandas/hashtable.pyx", line 541, in pandas.hashtable.Float64HashTable.get_item (pandas/hashtable.c:9914)
   File "pandas/hashtable.pyx", line 547, in pandas.hashtable.Float64HashTable.get_item (pandas/hashtable.c:9852)
 KeyError: 0.0

What am I missing?

Answers:

Like EdChum said 0 is probably not in your index.

Try: df.iloc[0] or df['label'].iloc[0], which is integer based location.

To reset the index if you are having trouble with that: df.reset_index(drop=True)

Check out panda’s indexing doc for more information on it

Answered By: Seth

In the case in the OP, the variable name test_df suggests that it was created by splitting a dataframe into train and test sets, so it’s very likely that test_df didn’t have index=0. You can check it by

0 in test_df.index

and if it return False then there isn’t index=0.

Nevertheless, to access the first row, you can use test_df.iloc or test_df.take() (similar to numpy.take) or even loc:

test_df.take([0])
test_df.iloc[0]
test_df.loc[test_df.index[0]]

For a scalar value, you can even use iat:

test_df["LABEL"].iat[0]

If the index is not important and you want to reset it to a range index, then as Seth suggests, reset the index; just make sure to assign the result back (so that the change is permanent).

test_df = test_df.reset_index()            # the old index becomes a column in the dataframe
test_df = test_df.reset_index(drop=True)   # the old index is thrown away

You may also get a key error for columns if the dataframe doesn’t have a column with a specific name. A common culprit is a leading/trailing white space, e.g. 'LABEL ' instead of 'LABEL'. The following should return True for you to select LABEL column.

'LABEL' in test_df.columns

If the above returns False, try

test_df.columns = test_df.columns.str.strip()

and try selecting via test_df['LABEL'] again.

Answered By: cottontail