Don't understand Python's csv.reader object

Question:

I’ve come across a behavior in python’s built-in csv module that I’ve never noticed before. Typically, when I read in a csv, it’s following the doc’s pretty much verbatim, using ‘with’ to open the file then looping over the reader object with a ‘for’ loop. However, I recently tried iterating over the csv.reader object twice in a row, only to find out that the second ‘for’ loop did nothing.

import csv

with open('smallfriends.csv','rU') as csvfile:
readit = csv.reader(csvfile,delimiter=',')

for line in readit:
    print line

for line in readit:
    print 'foo'

Console Output:

Austins-iMac:Desktop austin$ python -i amy.py 
['Amy', 'James', 'Nathan', 'Sara', 'Kayley', 'Alexis']
['James', 'Nathan', 'Tristan', 'Miles', 'Amy', 'Dave']
['Nathan', 'Amy', 'James', 'Tristan', 'Will', 'Zoey']
['Kayley', 'Amy', 'Alexis', 'Mikey', 'Sara', 'Baxter']
>>>
>>> readit
<_csv.reader object at 0x1023fa3d0>
>>> 

So the second ‘for’ loop basically does nothing. One thought I had is the csv.reader object is being released from memory after being read once. This isn’t the case though since it still retains it’s memory address. I found a post that mentions a similar problem. The reason they gave is that once the object is read, the pointer stay’s at the end of the memory address ready to write data to the object. Is this correct? Could someone go into greater detail as to what is going on here? Is there a way to push the pointer back to the beginning of the memory address to reread it? I know it’s bad coding practices to do that but I’m mainly just curious and wanting to learn more about what goes on under Python’s hood.

Thanks!

Asked By: Austin A

||

Answers:

If it’s not too much data, you can always read it into a list:

import csv

with open('smallfriends.csv','rU') as csvfile:
    readit = csv.reader(csvfile,delimiter=',')
    csvdata = list(readit)

    for line in csvdata :
        print line

    for line in csvdata :
        print 'foo'
Answered By: monkut

Iterating over a csvreader simply wraps iterating over the lines in the underlying file object.
On each iteration the reader gets the next line from the file, converts and returns it.

So iterating over a csvreader follows the same conventions as iterating over files.
That is, once the file reached its end you’d have to seek to the start before iterating a second time.

The below should do, though I haven’t tested it:

import csv

with open('smallfriends.csv','rU') as csvfile:
    readit = csv.reader(csvfile,delimiter=',')

    for line in readit:
        print line

    # go back to the start of the file
    csvfile.seek(0)

    for line in readit:
        print 'foo
Answered By: sebastian

I’ll try to answer your other questions about what the reader is doing and why reset() or seek(0) might help. In the most basic form, the csv reader might look something like this:

def csv_reader(it):
    for line in it:
        yield line.strip().split(',')

That is, it takes any iterator producing strings and gives you a generator. All it does is take an item from your iterator, process it and return the item. When it is consumed, the csv_reader will quit. The reader has no idea where the iterator came from or how to properly make a fresh one, so it doesn’t even try to reset itself. That is left to the programmer.

We can either modify the iterator in place without the reader knowing or just make a new reader. Here are some examples to demonstrate my point.

data = open('data.csv', 'r')
reader = csv.reader(data)

print(next(reader))               # Parse the first line
[next(data) for _ in range(5)]    # Skip the next 5 lines on the underlying iterator
print(next(reader))               # This will be the 7'th line in data
print(reader.line_num)            # reader thinks this is the 2nd line
data.seek(0)                      # Go back to the beginning of the file
print(next(reader))               # gives first line again

data = ['1,2,3', '4,5,6', '7,8,9']
reader = csv.reader(data)         # works fine on lists of strings too
print(next(reader))               # ['1', '2', '3']

In general if you need a 2nd pass, its best to close/reopen your files and use a new csv reader. Its clean and ensures nice bookkeeping.

Answered By: kalhartt
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.