Find number of columns in csv file
Question:
My program needs to read csv files which may have 1,2 or 3 columns, and it needs to modify its behaviour accordingly. Is there a simple way to check the number of columns without “consuming” a row before the iterator runs? The following code is the most elegant I could manage, but I would prefer to run the check before the for loop starts:
import csv
f = 'testfile.csv'
d = 't'
reader = csv.reader(f,delimiter=d)
for row in reader:
if reader.line_num == 1: fields = len(row)
if len(row) != fields:
raise CSVError("Number of fields should be %s: %s" % (fields,str(row)))
if fields == 1:
pass
elif fields == 2:
pass
elif fields == 3:
pass
else:
raise CSVError("Too many columns in input file.")
Edit: I should have included more information about my data. If there is only one field, it must contain a name in scientific notation. If there are two fields, the first must contain a name, and the second a linking code. If there are three fields, the additional field contains a flag which specifies whether the name is currently valid. Therefore if any row has 1, 2 or 3 columns, all must have the same.
Answers:
You can use itertools.tee
itertools.tee(iterable[, n=2])
Return n independent iterators from a
single iterable.
eg.
reader1, reader2 = itertools.tee(csv.reader(f, delimiter=d))
columns = len(next(reader1))
del reader1
for row in reader2:
...
Note that it’s important to delete the reference to reader1
when you are finished with it – otherwise tee
will have to store all the rows in memory in case you ever call next(reader1)
again
What happens if the user provides you with a CSV file with fewer columns? Are default values used instead?
If so, why not extend the row with null values instead?
reader = csv.reader(f,delimiter=d)
for row in reader:
row += [None] * (3 - len(row))
try:
foo, bar, baz = row
except ValueError:
# Too many values to unpack: too many columns in the CSV
raise CSVError("Too many columns in input file.")
Now bar and baz will at least be None
and the exception handler will take care of any rows longer than 3 items.
This seems to work as well:
import csv
datafilename = 'testfile.csv'
d = 't'
f = open(datafilename,'r')
reader = csv.reader(f,delimiter=d)
ncol = len(next(reader)) # Read first line and count columns
f.seek(0) # go back to beginning of file
for row in reader:
pass #do stuff
I would rebuild it as follows ( if the file is not too big ):
import csv
f = 'testfile.csv'
d = 't'
reader = list(csv.reader(f,delimiter=d))
fields = len( reader[0] )
for row in reader:
if fields == 1:
pass
elif fields == 2:
pass
elif fields == 3:
pass
else:
raise CSVError("Too many columns in input file.")
I would suggest a simple way like this:
with open('./testfile.csv', 'r') as csv:
first_line = csv.readline()
your_data = csv.readlines()
ncol = first_line.count(',') + 1
My program needs to read csv files which may have 1,2 or 3 columns, and it needs to modify its behaviour accordingly. Is there a simple way to check the number of columns without “consuming” a row before the iterator runs? The following code is the most elegant I could manage, but I would prefer to run the check before the for loop starts:
import csv
f = 'testfile.csv'
d = 't'
reader = csv.reader(f,delimiter=d)
for row in reader:
if reader.line_num == 1: fields = len(row)
if len(row) != fields:
raise CSVError("Number of fields should be %s: %s" % (fields,str(row)))
if fields == 1:
pass
elif fields == 2:
pass
elif fields == 3:
pass
else:
raise CSVError("Too many columns in input file.")
Edit: I should have included more information about my data. If there is only one field, it must contain a name in scientific notation. If there are two fields, the first must contain a name, and the second a linking code. If there are three fields, the additional field contains a flag which specifies whether the name is currently valid. Therefore if any row has 1, 2 or 3 columns, all must have the same.
You can use itertools.tee
itertools.tee(iterable[, n=2])
Return n independent iterators from a
single iterable.
eg.
reader1, reader2 = itertools.tee(csv.reader(f, delimiter=d))
columns = len(next(reader1))
del reader1
for row in reader2:
...
Note that it’s important to delete the reference to reader1
when you are finished with it – otherwise tee
will have to store all the rows in memory in case you ever call next(reader1)
again
What happens if the user provides you with a CSV file with fewer columns? Are default values used instead?
If so, why not extend the row with null values instead?
reader = csv.reader(f,delimiter=d)
for row in reader:
row += [None] * (3 - len(row))
try:
foo, bar, baz = row
except ValueError:
# Too many values to unpack: too many columns in the CSV
raise CSVError("Too many columns in input file.")
Now bar and baz will at least be None
and the exception handler will take care of any rows longer than 3 items.
This seems to work as well:
import csv
datafilename = 'testfile.csv'
d = 't'
f = open(datafilename,'r')
reader = csv.reader(f,delimiter=d)
ncol = len(next(reader)) # Read first line and count columns
f.seek(0) # go back to beginning of file
for row in reader:
pass #do stuff
I would rebuild it as follows ( if the file is not too big ):
import csv
f = 'testfile.csv'
d = 't'
reader = list(csv.reader(f,delimiter=d))
fields = len( reader[0] )
for row in reader:
if fields == 1:
pass
elif fields == 2:
pass
elif fields == 3:
pass
else:
raise CSVError("Too many columns in input file.")
I would suggest a simple way like this:
with open('./testfile.csv', 'r') as csv:
first_line = csv.readline()
your_data = csv.readlines()
ncol = first_line.count(',') + 1