Python parse CSV ignoring comma with double-quotes

Question:

I have a CSV file with lines like this:

"AAA", "BBB", "Test, Test", "CCC"
"111", "222, 333", "XXX", "YYY, ZZZ" 

and so on …

I dont want to parse comma’s under double-quotes. ie. My expected result should be

AAA
BBB
Test, Test
CCC

My code:

import csv
with open('values.csv', 'rb') as f:
    reader = csv.reader(f)
    for row in reader:
        print row

I tried using csv package under python but no luck. The parses explodes all comma’s.

Please let me know if I’m missing something

Asked By: Abhi

||

Answers:

You have spaces before the quote characters in your input. Set skipinitialspace to True to skip any whitespace following a delimiter:

When True, whitespace immediately following the delimiter is ignored. The default is False.

>>> import csv
>>> lines = '''
... "AAA", "BBB", "Test, Test", "CCC"
... "111", "222, 333", "XXX", "YYY, ZZZ" 
... '''
>>> reader = csv.reader(lines.splitlines())
>>> next(reader)
['AAA', ' "BBB"', ' "Test', ' Test"', ' "CCC"']
>>> reader = csv.reader(lines.splitlines(), skipinitialspace=True)
>>> next(reader)
['AAA', 'BBB', 'Test, Test', 'CCC']
Answered By: Martijn Pieters

This should do:

lines = '''"AAA", "BBB", "Test, Test", "CCC"
           "111", "222, 333", "XXX", "YYY, ZZZ"'''.splitlines()
for l in  csv.reader(lines, quotechar='"', delimiter=',',
                     quoting=csv.QUOTE_ALL, skipinitialspace=True):
    print l
>>> ['AAA', 'BBB', 'Test, Test', 'CCC']
>>> ['111', '222, 333', 'XXX', 'YYY, ZZZ']
Answered By: Michael

[Posted edited to be more clear.]
If you dont want to parse comma’s under double-quotes so your output will include the commas inside the columns, here is another way of doing this. It is elegant and allows you to use cloud buckets to store your CSV file. The key is to use smart_open as a drop-in replacement to the standard file open.

Also, I am using DictReader instead of reader.

import csv
import json
from smart_open import open

with open('./temp.csv') as csvFileObj:
    reader = csv.DictReader(csvFileObj, delimiter=',', quotechar='"')
    # csv.reader requires bytestring input in python2, unicode input in python3
    for record in reader:
        # record is a dictionary of the csv record
        print(f'Record as json shows proper reading of file:n {json.dumps(record, indent=4)})')
        print(f'You can reference an individual field too: {record["field3"]}')
        print(f'                                           {record["field4"]}')

Note that I added 2 parameters to DictReader.
delimiter=’,’, quotechar=’"’
Comma is the default delimiter but I added it in case someone needs to change it. Quotechar is necessary because it is not the default.
Real output from code:

Record as json shows proper reading of file:
 {
    "field1": "AAA",
    "field2": "BBB",
    "field3": "Test, Test",
    "field4": "CCC"
})
You can reference an individual field too: Test, Test
                                           CCC
done
Record as json shows proper reading of file:
 {
    "field1": "111",
    "field2": "222, 333",
    "field3": "XXX",
    "field4": "YYY, ZZZ"
})
You can reference an individual field too: XXX
                                           YYY, ZZZInput file:

Input data file (I added a header record for clarity. If you don’t have a header record the first record will get gobbled up but there is prob a parameter for that too.)

"field1","field2","field3","field4"
"AAA","BBB","Test, Test","CCC"
"111","222, 333","XXX","YYY, ZZZ"
Answered By: G. Casey
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.