Python CSV: nested double quotes

Question:

I have a test.csv file as follows:

"N";"INFO"
"1";"<a href="www.google.it">www.google.it</a>"

I use the following program to print out the contents of the CSV file

import csv
with open('test.csv', newline='') as csvfile:
    reader=csv.DictReader(csvfile, delimiter=';')
    for p in reader:
        print("%s %s" % (p['N'], p['INFO']))

The output is

1 <a href=www.google.it">www.google.it</a>"

The reason lies probably in the fact that the csv file has some "nested" double quotes. However, the separating character is ";", and so I would like the library to simply remove the double quote " at the beginning and at the end of the field INFO, keeping the rest of the string intact.

In other words, I would like the output of the program to be

1 <a href="www.google.it">www.google.it</a>

How can I fix that, without modifying the test.csv file?

Asked By: francesco

||

Answers:

One possibility is to use the csv module with csv.QUOTE_NONE, then handle the removal of the quotes (on both the fieldnames and the values) manually:

import csv

def strip_outer_quotes(s):
    """ Strip an outer pair of quotes (only) from a string. If not quoted,
    string is returned unchanged. """
    if s[0] == s[-1] == '"':
        return s[1:-1]
    else:
        return s

def my_csv_reader(fh):
    """ Thin wrapper around csv.DictReader to handle fields which are
    quoted but contain unquoted " characters. """
    reader = csv.DictReader(fh, delimiter=';', quoting=csv.QUOTE_NONE)
    reader.fieldnames = [strip_outer_quotes(fn) for fn in reader.fieldnames]
    for row in reader:
        yield {k: strip_outer_quotes(v) for k, v in row.items()}

with open('test.csv', newline='') as csvfile:
    reader = my_csv_reader(csvfile)
    for p in reader:
        print("%s %s" % (p['N'], p['INFO'])) 

Note: instead of my_csv_reader, probably name the function after the source of this particular variant of CSV; acme_csv_reader or similar

Answered By: Jiří Baum
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.