KeyError: "None of [['', '']] are in the [columns]" pandas python
Question:
I would like to slice two columns in my data frame.
This is my code for doing this:
import pandas as pd
df = pd.read_csv('source.txt',header=0)
cidf = df.loc[:,['vocab','sumCI']]
This is a sample of data:
ID vocab sumCI sumnextCI new_diff
450 statu 3.0 0.0 3.0
391 provid 4.0 1.0 3.0
382 prescript 3.0 0.0 3.0
300 lymphoma 2.0 0.0 2.0
405 renew 2.0 0.0 2.0
Firstly I got this error:
KeyError: “None of [['', '']] are in the [columns]”'
What I have tried:
- I tried putting a
header
with index 0
while reading the file,
- I tried to rename columns with this code:
df.rename(columns=df.iloc[0], inplace=True)
- I also tried this:
df.columns = df.iloc[1]
df = df.reindex(df.index.drop(0))
- Also tried comments in this link
None of the above resolved the issue.
Answers:
By the print you posted, it seems like you have whitespaces as delimiters. pd.read_csv
will read using ,
as default separator, so you have to explicitly state it:
pd.read_csv('source.txt',header=0, delim_whitespace=True)
simply write code to create a new CSV file and use a new file
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.read_csv('source.txt',header=0, delim_whitespace=True)
headers = ['ID','vocab','sumCI','sumnextCI','new_diff']
df.columns = headers
df.to_csv('newsource.txt')
You can try doing this:
pd.read_csv('source.txt',header=0, delim_whitespace=True)
If you have any white spaces in the data you’re will get an error, so delim_whitespace
is included to remove those in case they’re in the data.
Maybe you have white spaces around your column names, double check your csv file
If you get this (or similar) error, check if your dataframe contains these columns. The following should be returning True
in order for the indexing to work.
cols = ['vocab', 'sumCI']
set(df.columns).issuperset(cols)
If the above returns False
, then you’ll need to process the columns.
A common culprit is leading/trailing space, so try
df.columns = df.columns.str.strip()
Other common problems could be double underscore, double space or em dash (—
) between words in legitimate column names. Then you may try regex to remove surplus space and underscores and replace em dash by en dash in column names, etc.
df.columns = df.columns.to_series().replace({r's+': ' ', r'_+': '_', r'—': '-'}, regex=True)
I would like to slice two columns in my data frame.
This is my code for doing this:
import pandas as pd
df = pd.read_csv('source.txt',header=0)
cidf = df.loc[:,['vocab','sumCI']]
This is a sample of data:
ID vocab sumCI sumnextCI new_diff
450 statu 3.0 0.0 3.0
391 provid 4.0 1.0 3.0
382 prescript 3.0 0.0 3.0
300 lymphoma 2.0 0.0 2.0
405 renew 2.0 0.0 2.0
Firstly I got this error:
KeyError: “None of [['', '']] are in the [columns]”'
What I have tried:
- I tried putting a
header
withindex 0
while reading the file, - I tried to rename columns with this code:
df.rename(columns=df.iloc[0], inplace=True)
- I also tried this:
df.columns = df.iloc[1] df = df.reindex(df.index.drop(0))
- Also tried comments in this link
None of the above resolved the issue.
By the print you posted, it seems like you have whitespaces as delimiters. pd.read_csv
will read using ,
as default separator, so you have to explicitly state it:
pd.read_csv('source.txt',header=0, delim_whitespace=True)
simply write code to create a new CSV file and use a new file
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.read_csv('source.txt',header=0, delim_whitespace=True)
headers = ['ID','vocab','sumCI','sumnextCI','new_diff']
df.columns = headers
df.to_csv('newsource.txt')
You can try doing this:
pd.read_csv('source.txt',header=0, delim_whitespace=True)
If you have any white spaces in the data you’re will get an error, so delim_whitespace
is included to remove those in case they’re in the data.
Maybe you have white spaces around your column names, double check your csv file
If you get this (or similar) error, check if your dataframe contains these columns. The following should be returning True
in order for the indexing to work.
cols = ['vocab', 'sumCI']
set(df.columns).issuperset(cols)
If the above returns False
, then you’ll need to process the columns.
A common culprit is leading/trailing space, so try
df.columns = df.columns.str.strip()
Other common problems could be double underscore, double space or em dash (—
) between words in legitimate column names. Then you may try regex to remove surplus space and underscores and replace em dash by en dash in column names, etc.
df.columns = df.columns.to_series().replace({r's+': ' ', r'_+': '_', r'—': '-'}, regex=True)