"Line contains NULL byte" in CSV reader (Python)

Question

I’m trying to write a program that looks at a .CSV file (input.csv) and rewrites only the rows that begin with a certain element (corrected.csv), as listed in a text file (output.txt).

This is what my program looks like right now:

import csv

lines = []
with open('output.txt','r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'r') as mycsv:
        reader = csv.reader(mycsv)
        for row in reader:
            if row[0] not in lines:
                writer.writerow(row)

Unfortunately, I keep getting this error, and I have no clue what it’s about.

Traceback (most recent call last):
  File "C:Python32Sample ProgramcsvParser.py", line 12, in <module>
    for row in reader:
_csv.Error: line contains NULL byte

Credit to all the people here to even to get me to this point.

Asked By: James Roseman

||

Source

Answer 1

I’m guessing you have a NUL byte in input.csv. You can test that with

if '' in open('input.csv').read():
    print "you have null bytes in your input file"
else:
    print "you don't"

if you do,

reader = csv.reader(x.replace('', '') for x in mycsv)

may get you around that. Or it may indicate you have utf16 or something ‘interesting’ in the .csv file.

Answered By: retracile

Answer 2

This will tell you what line is the problem.

import csv

lines = []
with open('output.txt','r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'r') as mycsv:
        reader = csv.reader(mycsv)
        try:
            for i, row in enumerate(reader):
                if row[0] not in lines:
                   writer.writerow(row)
        except csv.Error:
            print('csv choked on line %s' % (i+1))
            raise

Perhaps this from daniweb would be helpful:

I’m getting this error when reading from a csv file: “Runtime Error!
line contains NULL byte”. Any idea about the root cause of this error?

…

Ok, I got it and thought I’d post the solution. Simply yet caused me
grief… Used file was saved in a .xls format instead of a .csv Didn’t
catch this because the file name itself had the .csv extension while
the type was still .xls

Answered By: Steven Rumbalski

Answer 3

I’ve solved a similar problem with an easier solution:

import codecs
csvReader = csv.reader(codecs.open('file.csv', 'rU', 'utf-16'))

The key was using the codecs module to open the file with the UTF-16 encoding, there are a lot more of encodings, check the documentation.

Answered By: K. David C.

Answer 4

You could just inline a generator to filter out the null values if you want to pretend they don’t exist. Of course this is assuming the null bytes are not really part of the encoding and really are some kind of erroneous artifact or bug.

See the (line.replace('','') for line in f) below, also you’ll want to probably open that file up using mode rb.

import csv

lines = []
with open('output.txt','r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'rb') as mycsv:
        reader = csv.reader( (line.replace('','') for line in mycsv) )
        for row in reader:
            if row[0] not in lines:
                writer.writerow(row)

Answered By: woot

Answer 5

I’ve recently fixed this issue and in my instance it was a file that was compressed that I was trying to read. Check the file format first. Then check that the contents are what the extension refers to.

Answered By: Daniel Lee

Answer 6

Turning my linux environment into a clean complete UTF-8 environment made the trick for me.
Try the following in your command line:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

Answered By: Philippe Oger

Answer 7

A tricky way:

If you develop under Lunux, you can use all the power of sed:

from subprocess import check_call, CalledProcessError

PATH_TO_FILE = '/home/user/some/path/to/file.csv'

try:
    check_call("sed -i -e 's|\x0||g' {}".format(PATH_TO_FILE), shell=True)
except CalledProcessError as err:
    print(err)

The most efficient solution for huge files.

Checked for Python3, Kubuntu

Answered By: SergO

Answer 8

If you want to replace the nulls with something you can do this:

def fix_nulls(s):
    for line in s:
        yield line.replace('', ' ')

r = csv.reader(fix_nulls(open(...)))

Answered By: Claudiu

Answer 9

pandas.read_csv now handles the different UTF encoding when reading/writing and therefore can deal directly with null bytes

data = pd.read_csv(file, encoding='utf-16')

see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Answered By: Sébastien Wieckowski

Answer 10

This is long settled, but I ran across this answer because I was experiencing an unexpected error while reading a CSV to process as training data in Keras and TensorFlow.

In my case, the issue was much simpler, and is worth being conscious of. The data being produced into the CSV wasn’t consistent, resulting in some columns being completely missing, which seems to end up throwing this error as well.

The lesson: If you’re seeing this error, verify that your data looks the way that you think it does!

Answered By: David Hoelzer

Answer 11

It is very simple.

don’t make a csv file by "create new excel" or save as ".csv" from window.

simply import csv module, write a dummy csv file, and then paste your data in that.

csv made by python csv module itself will no longer show you encoding or blank line error.

Answered By: nitish gupta

Answer 12

for skipping the NULL byte rows

import csv

with open('sample.csv', newline='') as csv_file:
    reader = csv.reader(csv_file)
    while True:
        try:
            row = next(reader)
            print(row)
        except csv.Error:
            continue
        except StopIteration:
            break

Answered By: shrhawk

Answer 13

    def fix_nulls(s):
        for line in s:
        yield line.replace('', '')

    with open(csv_file, 'r', encoding = "utf-8") as f:
        reader = csv.reader(fix_nulls(f))
        for line in reader:
            #do something

this way works for me

Answered By: masterlk

Answer 14

The above information is great. For me I had this same error. My fix was easy and just user error aka myself. Simply save the file as a csv and not an excel file.

Answered By: JQTs

"Line contains NULL byte" in CSV reader (Python)

Question:

Answers: