KeyError when indexing Pandas dataframe
Question:
I am trying to read data from a csv file into a pandas dataframe, and access the first column ‘Date’
import pandas as pd
df_ticks=pd.read_csv('values.csv', delimiter=',')
print(df_ticks.columns)
df_ticks['Date']
produces the following result
Index([u'Date', u'Open', u'High', u'Low', u'Close', u'Volume'], dtype='object')
KeyError: u'no item named Date'
If I try to acces any other column like ‘Open’ or ‘Volume’ it is working as expected
Answers:
You most likely have an extra character at the beginning of your file, that is prepended to your first column name, 'Date'
. Simply Copy / Paste your output to a non-unicode console produces.
Index([u'?Date', u'Open', u'High', u'Low', u'Close', u'Volume'], dtype='object')
As mentioned by alko, it is probably extra character at the beginning of your file.
When using read_csv
, you can specify encoding
to deal with encoding and heading character, known as BOM (Byte order mark)
df = pd.read_csv('values.csv', delimiter=',', encoding="utf-8-sig")
This question finds some echoes on Stackoverflow:
Pandas seems to ignore first column name when reading tab-delimited data, gives KeyError
It is almost always one of these reasons
- You spelled the column name wrong
- There are leading/trailing whitespaces
- in this case, use
df.columns = df.columns.str.strip()
to remove them, or revisit your pd.read_csv
(or other IO function) call to see if you can remove them while parsing input
- Your column is not actually a column, but an index level
- you can check the index level names using
df.index.names
to see if it is there. Calling .reset_index()
before selecting the column should fix it.
- Your DataFrame does not have the column, at all
- it was all just a figment of your imagination. Please turn off your system and take a nap.
Regardless of the reason, the first step is to stop what you’re doing and run print(df.columns.tolist())
and eyeball the result to see which of these 4 possible reasons it could be.
I am trying to read data from a csv file into a pandas dataframe, and access the first column ‘Date’
import pandas as pd
df_ticks=pd.read_csv('values.csv', delimiter=',')
print(df_ticks.columns)
df_ticks['Date']
produces the following result
Index([u'Date', u'Open', u'High', u'Low', u'Close', u'Volume'], dtype='object')
KeyError: u'no item named Date'
If I try to acces any other column like ‘Open’ or ‘Volume’ it is working as expected
You most likely have an extra character at the beginning of your file, that is prepended to your first column name, 'Date'
. Simply Copy / Paste your output to a non-unicode console produces.
Index([u'?Date', u'Open', u'High', u'Low', u'Close', u'Volume'], dtype='object')
As mentioned by alko, it is probably extra character at the beginning of your file.
When using read_csv
, you can specify encoding
to deal with encoding and heading character, known as BOM (Byte order mark)
df = pd.read_csv('values.csv', delimiter=',', encoding="utf-8-sig")
This question finds some echoes on Stackoverflow:
Pandas seems to ignore first column name when reading tab-delimited data, gives KeyError
It is almost always one of these reasons
- You spelled the column name wrong
- There are leading/trailing whitespaces
- in this case, use
df.columns = df.columns.str.strip()
to remove them, or revisit yourpd.read_csv
(or other IO function) call to see if you can remove them while parsing input
- in this case, use
- Your column is not actually a column, but an index level
- you can check the index level names using
df.index.names
to see if it is there. Calling.reset_index()
before selecting the column should fix it.
- you can check the index level names using
- Your DataFrame does not have the column, at all
- it was all just a figment of your imagination. Please turn off your system and take a nap.
Regardless of the reason, the first step is to stop what you’re doing and run print(df.columns.tolist())
and eyeball the result to see which of these 4 possible reasons it could be.