Key error when selecting columns in pandas dataframe after read_csv

Question:

I’m trying to read in a CSV file into a pandas dataframe and select a column, but keep getting a key error.

The file reads in successfully and I can view the dataframe in an iPython notebook, but when I want to select a column any other than the first one, it throws a key error.

I am using this code:

import pandas as pd

transactions = pd.read_csv('transactions.csv',low_memory=False, delimiter=',', header=0, encoding='ascii')
transactions['quarter']

This is the file I’m working on:
https://www.dropbox.com/s/81iwm4f2hsohsq3/transactions.csv?dl=0

Thank you!

Asked By: Harry M

||

Answers:

use sep='s*,s*' so that you will take care of spaces in column-names:

transactions = pd.read_csv('transactions.csv', sep=r's*,s*',
                           header=0, encoding='ascii', engine='python')

alternatively you can make sure that you don’t have unquoted spaces in your CSV file and use your command (unchanged)

prove:

print(transactions.columns.tolist())

Output:

['product_id', 'customer_id', 'store_id', 'promotion_id', 'month_of_year', 'quarter', 'the_year', 'store_sales', 'store_cost', 'unit_sales', 'fact_count']

The key error generally comes if the key doesn’t match any of the dataframe column name ‘exactly’:

You could also try:

import csv
import pandas as pd
import re
    with open (filename, "r") as file:
        df = pd.read_csv(file, delimiter = ",")
        df.columns = ((df.columns.str).replace("^ ","")).str.replace(" $","")
        print(df.columns)
Answered By: beta

if you need to select multiple columns from dataframe use 2 pairs of square brackets
eg.

df[["product_id","customer_id","store_id"]]
Answered By: Aswin Babu

I met the same problem that key errors occur when filtering the columns after reading from CSV.

Reason

The main reason of these problems is the extra initial white spaces in your CSV files. (found in your uploaded CSV file, e.g. , customer_id, store_id, promotion_id, month_of_year, )

Proof

To prove this, you could try print(list(df.columns)) and the names of columns must be ['product_id', ' customer_id', ' store_id', ' promotion_id', ' month_of_year', ...].

Solution

The direct way to solve this is to add the parameter in pd.read_csv(), for example:

pd.read_csv('transactions.csv', 
            sep = ',', 
            skipinitialspace = True)

Reference: https://stackoverflow.com/a/32704818/16268870

Answered By: Hang Yan

Datsets when split by ‘,’, create features with a space in the beginning. Removing the space using a regex might help.

For the time being I did this:

label_name = ‘ Label’

Answered By: Mudita Kohli

Give the full path of the CSV file in the pd.read_csv(). This works for me.

Answered By: vimal
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.