Trying to understand python csv .next()

Question:

I have the following code that is part of a tutorial

import csv as csv
import numpy as np

csv_file_object = csv.reader(open("train.csv", 'rb'))
header = csv_file_object.next()

data = []
for row in csv_file_object:
    data.append(row)
data = np.array(data)

the code works as it is supposed to but it is not clear to me why calling .next() on the file with the variable header works. Isn’t csv_file_object still the entire file? How does the program know to skip the header row when for row in csv_file_object is called since it doesn’t appear the variable header is ever referenced once defined?

Asked By: davidheller

||

Answers:

The header row is “skipped” as a result of calling next(). That’s how iterators work.

When you loop over an iterator, its next() method is called each time. Each call advances the iterator. When the for loop starts, the iterator is already at the second row, and it goes from there on.

Here’s the documentation on the next() method (here’s another piece).

What’s important is that csv.reader objects are iterators, just like file object returned by open(). You can iterate over them, but they don’t contain all of the lines (or any of the lines) at any given moment.

Answered By: Lev Levitsky

csv.reader is an iterator. It reads a line from the csv every time that .next is called. Here’s the documentation: http://docs.python.org/2/library/csv.html. An iterator object can actually return values from a source that is too big to read all at once. using a for loop with an iterator effectively calls .next on each time through the loop.

Answered By: Peter Wooster

The csv.reader object is an iterator. An iterator is an object with a next() method that will return the next value available or raise StopIteration if no value is available. The csv.reader will returns value line by line.

The iterators objects are how python implements for loop. At the beginning of the loop, the __iter__ object of the looped over object will be called. It must return an iterator. Then, the next method of that object will be called and the value stored in the loop variable until the next method raises StopIteration exception.

In your example, by adding a call to next before using the variable in the for loop construction, you are removing the first value from the stream of values returned by the iterator.

You can see the same effect with simpler iterators:

iterator = [0, 1, 2, 3, 4, 5].__iter__()
value = iterator.next()
for v in iterator:
    print v,
1 2 3 4 5
print value
0
Answered By: Sylvain Defresne

The csv.reader is an iterator. Calling .next() will obtain the next value as it iterates through the file.

In the below code the for loop is calling .next() on the iterator each time and allocating the result of next to the variable row.

for row in csv_file_object:
    data.append(row)
Answered By: Matt Alcock

The behavior of next() is more than that, all expose above is ok but there is one thing missing, also using next you are telling the iterator from what line you want to begin the iteration so is a problem let’s say that I want some value that is in line 3 without going trough all the lines i can easily use next I got the value, but if I need to iterate on the first line in my case I can’t because no matter what the iterator still starting at line 3 so I can not start from line 1 well there is a way but I didn’t find it yet.

Answered By: Alain Abrahan

Although the original question has been answered correctly in the accepted answer by @Lev among others,there is an error in the way next() is used in OP’s code, which hasn’t been pointed out in any of the answer.

header = csv_file_object.next()

Calling next() as a method on file object and assigning to a variable will lead to an error as it does not return anything and just moves the iterator to next item(next row in csv file).
If you have to only skip the header, the following will do:

    csv_file_object.next()

To save header data in a variable, next() has to be called as a function with the file_obj as the argument.

    header = next(csv_file_object)

@Lev has linked to documentation but missed pointing out this error in @davidheller’s code.

Answered By: azzam
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.