ufeff is appearing while reading csv using unicodecsv module

Question:

I have following code

import unicodecsv
CSV_PARAMS = dict(delimiter=",", quotechar='"', lineterminator='n')
unireader = unicodecsv.reader(open('sample.csv', 'rb'), **CSV_PARAMS)
for line in unireader:
    print(line)

and it prints

['ufeff"003', 'word one"']
['003,word two']
['003,word three']

The CSV looks like this

"003,word one"
"003,word two"
"003,word three"

I am unable to figure out why the first row has ufeff (which is i believe a file marker). Moreover, there is " at the beginning of first row.

The CSV file is comign from client so i can’t dictate them how to save a file etc. Looking to fix my code so that it can handle encoding.

Note: I have already tried passing encoding='utf8' to CSV_PARAMS and it didn’t solve the problem

Asked By: Em Ae

||

Answers:

encoding='utf-8-sig' will remove the UTF-8-encoded BOM (byte order mark) used as a UTF-8 signature in some files:

import unicodecsv

with open('sample.csv','rb') as f:
    r = unicodecsv.reader(f, encoding='utf-8-sig')
    for line in r:
        print(line)

Output:

['003,word one']
['003,word two']
['003,word three']

But why are you using the third-party unicodecsv with Python 3? The built-in csv module handles Unicode correctly:

import csv

# Note, newline='' is a documented requirement for the csv module
# for reading and writing CSV files.
with open('sample.csv', encoding='utf-8-sig', newline='') as f:
    r = csv.reader(f)
    for line in r:
        print(line)
Answered By: Mark Tolonen
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.