How to read bad lines in csv files using Pandas in Python?

Question:

The csv file has the following structure:

a,b,c
a,b,c,d,e,f,g
a,b,c,d
a,b,c

if I use file = pd.read_csv('Desktop/export.csv',delimiter=','), it will throw a tokenizing error like this:
pandas.errors.ParserError: Error tokenizing data. C error: Expected 9 fields in line 3, saw 10

I do NOT want to skip bad lines. I want to read the csv with all columns and create a dataframe that looks like:

unnamed column1, unnamed column2, ....... unnamed column 7
a,b,c
a,b,c,d,e,f,g
a,b,c,d
a,b,c

How can I load the bad lines in the csv files?

Asked By: ichino

||

Answers:

You can load the csv file into a dataframe while specifying the maximum number of columns that it should expect using the names parameter and engine='python' of the read_csv() function.

Here is an example:

import pandas as pd

df = pd.read_csv('Desktop/export.csv', delimiter=',', names=range(7), engine='python')
print(df)

This will create a dataframe with 7 unnamed columns and will read all the lines of the csv file, including the lines that have a different number of columns. The values for the missing columns will be filled with NaN.

Please note that by default the csv file is assumed to have a header, if you want to read the csv without header you have to add the parameter header=None in the read_csv function.

Answered By: Etienne Plt

You can use the error_bad_lines set to false.

import pandas as pd

file = pd.read_csv('Desktop/export.csv', delimiter=',',error_bad_lines=False)
Answered By: iohans
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.