Find number of columns in csv file

Question:

My program needs to read csv files which may have 1,2 or 3 columns, and it needs to modify its behaviour accordingly. Is there a simple way to check the number of columns without “consuming” a row before the iterator runs? The following code is the most elegant I could manage, but I would prefer to run the check before the for loop starts:

import csv
f = 'testfile.csv'
d = 't'

reader = csv.reader(f,delimiter=d)
for row in reader:
    if reader.line_num == 1: fields = len(row)
    if len(row) != fields:
        raise CSVError("Number of fields should be %s: %s" % (fields,str(row)))
    if fields == 1:
        pass
    elif fields == 2:
        pass
    elif fields == 3:
        pass
    else:
        raise CSVError("Too many columns in input file.")

Edit: I should have included more information about my data. If there is only one field, it must contain a name in scientific notation. If there are two fields, the first must contain a name, and the second a linking code. If there are three fields, the additional field contains a flag which specifies whether the name is currently valid. Therefore if any row has 1, 2 or 3 columns, all must have the same.

Asked By: rudivonstaden

||

Answers:

You can use itertools.tee

itertools.tee(iterable[, n=2])
Return n independent iterators from a
single iterable.

eg.

reader1, reader2 = itertools.tee(csv.reader(f, delimiter=d))
columns = len(next(reader1))
del reader1
for row in reader2:
    ...

Note that it’s important to delete the reference to reader1 when you are finished with it – otherwise tee will have to store all the rows in memory in case you ever call next(reader1) again

Answered By: John La Rooy

What happens if the user provides you with a CSV file with fewer columns? Are default values used instead?

If so, why not extend the row with null values instead?

reader = csv.reader(f,delimiter=d)
for row in reader:
    row += [None] * (3 - len(row))
    try:
        foo, bar, baz = row
    except ValueError:
        # Too many values to unpack: too many columns in the CSV
        raise CSVError("Too many columns in input file.")

Now bar and baz will at least be None and the exception handler will take care of any rows longer than 3 items.

Answered By: Martijn Pieters

This seems to work as well:

import csv

datafilename = 'testfile.csv'
d = 't'
f = open(datafilename,'r')

reader = csv.reader(f,delimiter=d)
ncol = len(next(reader)) # Read first line and count columns
f.seek(0)              # go back to beginning of file
for row in reader:
    pass #do stuff
Answered By: mgilson

I would rebuild it as follows ( if the file is not too big ):

import csv
f = 'testfile.csv'
d = 't'

reader = list(csv.reader(f,delimiter=d))
fields = len( reader[0] )
for row in reader:
    if fields == 1:
        pass
    elif fields == 2:
        pass
    elif fields == 3:
        pass
    else:
        raise CSVError("Too many columns in input file.")
Answered By: Marco de Wit

I would suggest a simple way like this:

with open('./testfile.csv', 'r') as csv:
     first_line = csv.readline()
     your_data = csv.readlines()

ncol = first_line.count(',') + 1 
Answered By: Ashkan Mirzaee
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.