How do I read two lines from a file at a time using python
Question:
I am coding a python script that parses a text file. The format of this text file is such that each element in the file uses two lines and for convenience I would like to read both lines before parsing. Can this be done in Python?
I would like to some something like:
f = open(filename, "r")
for line in f:
line1 = line
line2 = f.readline()
f.close
But this breaks saying that:
ValueError: Mixing iteration and read methods would lose data
Related:
Answers:
Similar question here. You can’t mix iteration and readline so you need to use one or the other.
while True:
line1 = f.readline()
line2 = f.readline()
if not line2: break # EOF
...
use next()
, eg
with open("file") as f:
for line in f:
print(line)
nextline = next(f)
print("next line", nextline)
....
Works for even and odd-length files. It just ignores the unmatched last line.
f=file("file")
lines = f.readlines()
for even, odd in zip(lines[0::2], lines[1::2]):
print "even : ", even
print "odd : ", odd
print "end cycle"
f.close()
If you have large files, this is not the correct approach. You are loading all the file in memory with readlines(). I once wrote a class that read the file saving the fseek position of each start of line. This allows you to get specific lines without having all the file in memory, and you can also go forward and backward.
I paste it here. License is Public domain, meaning, do what you want with it. Please note that this class has been written 6 years ago and I haven’t touched or checked it since. I think it’s not even file compliant. Caveat emptor. Also, note that this is overkill for your problem. I’m not claiming you should definitely go this way, but I had this code and I enjoy sharing it if you need more complex access.
import string
import re
class FileReader:
"""
Similar to file class, but allows to access smoothly the lines
as when using readlines(), with no memory payload, going back and forth,
finding regexps and so on.
"""
def __init__(self,filename): # fold>>
self.__file=file(filename,"r")
self.__currentPos=-1
# get file length
self.__file.seek(0,0)
counter=0
line=self.__file.readline()
while line != '':
counter = counter + 1
line=self.__file.readline()
self.__length = counter
# collect an index of filedescriptor positions against
# the line number, to enhance search
self.__file.seek(0,0)
self.__lineToFseek = []
while True:
cur=self.__file.tell()
line=self.__file.readline()
# if it's not null the cur is valid for
# identifying a line, so store
self.__lineToFseek.append(cur)
if line == '':
break
# <<fold
def __len__(self): # fold>>
"""
member function for the operator len()
returns the file length
FIXME: better get it once when opening file
"""
return self.__length
# <<fold
def __getitem__(self,key): # fold>>
"""
gives the "key" line. The syntax is
import FileReader
f=FileReader.FileReader("a_file")
line=f[2]
to get the second line from the file. The internal
pointer is set to the key line
"""
mylen = self.__len__()
if key < 0:
self.__currentPos = -1
return ''
elif key > mylen:
self.__currentPos = mylen
return ''
self.__file.seek(self.__lineToFseek[key],0)
counter=0
line = self.__file.readline()
self.__currentPos = key
return line
# <<fold
def next(self): # fold>>
if self.isAtEOF():
raise StopIteration
return self.readline()
# <<fold
def __iter__(self): # fold>>
return self
# <<fold
def readline(self): # fold>>
"""
read a line forward from the current cursor position.
returns the line or an empty string when at EOF
"""
return self.__getitem__(self.__currentPos+1)
# <<fold
def readbackline(self): # fold>>
"""
read a line backward from the current cursor position.
returns the line or an empty string when at Beginning of
file.
"""
return self.__getitem__(self.__currentPos-1)
# <<fold
def currentLine(self): # fold>>
"""
gives the line at the current cursor position
"""
return self.__getitem__(self.__currentPos)
# <<fold
def currentPos(self): # fold>>
"""
return the current position (line) in the file
or -1 if the cursor is at the beginning of the file
or len(self) if it's at the end of file
"""
return self.__currentPos
# <<fold
def toBOF(self): # fold>>
"""
go to beginning of file
"""
self.__getitem__(-1)
# <<fold
def toEOF(self): # fold>>
"""
go to end of file
"""
self.__getitem__(self.__len__())
# <<fold
def toPos(self,key): # fold>>
"""
go to the specified line
"""
self.__getitem__(key)
# <<fold
def isAtEOF(self): # fold>>
return self.__currentPos == self.__len__()
# <<fold
def isAtBOF(self): # fold>>
return self.__currentPos == -1
# <<fold
def isAtPos(self,key): # fold>>
return self.__currentPos == key
# <<fold
def findString(self, thestring, count=1, backward=0): # fold>>
"""
find the count occurrence of the string str in the file
and return the line catched. The internal cursor is placed
at the same line.
backward is the searching flow.
For example, to search for the first occurrence of "hello
starting from the beginning of the file do:
import FileReader
f=FileReader.FileReader("a_file")
f.toBOF()
f.findString("hello",1,0)
To search the second occurrence string from the end of the
file in backward movement do:
f.toEOF()
f.findString("hello",2,1)
to search the first occurrence from a given (or current) position
say line 150, going forward in the file
f.toPos(150)
f.findString("hello",1,0)
return the string where the occurrence is found, or an empty string
if nothing is found. The internal counter is placed at the corresponding
line number, if the string was found. In other case, it's set at BOF
if the search was backward, and at EOF if the search was forward.
NB: the current line is never evaluated. This is a feature, since
we can so traverse occurrences with a
line=f.findString("hello")
while line == '':
line.findString("hello")
instead of playing with a readline every time to skip the current
line.
"""
internalcounter=1
if count < 1:
count = 1
while 1:
if backward == 0:
line=self.readline()
else:
line=self.readbackline()
if line == '':
return ''
if string.find(line,thestring) != -1 :
if count == internalcounter:
return line
else:
internalcounter = internalcounter + 1
# <<fold
def findRegexp(self, theregexp, count=1, backward=0): # fold>>
"""
find the count occurrence of the regexp in the file
and return the line catched. The internal cursor is placed
at the same line.
backward is the searching flow.
You need to pass a regexp string as theregexp.
returns a tuple. The fist element is the matched line. The subsequent elements
contains the matched groups, if any.
If no match returns None
"""
rx=re.compile(theregexp)
internalcounter=1
if count < 1:
count = 1
while 1:
if backward == 0:
line=self.readline()
else:
line=self.readbackline()
if line == '':
return None
m=rx.search(line)
if m != None :
if count == internalcounter:
return (line,)+m.groups()
else:
internalcounter = internalcounter + 1
# <<fold
def skipLines(self,key): # fold>>
"""
skip a given number of lines. Key can be negative to skip
backward. Return the last line read.
Please note that skipLines(1) is equivalent to readline()
skipLines(-1) is equivalent to readbackline() and skipLines(0)
is equivalent to currentLine()
"""
return self.__getitem__(self.__currentPos+key)
# <<fold
def occurrences(self,thestring,backward=0): # fold>>
"""
count how many occurrences of str are found from the current
position (current line excluded... see skipLines()) to the
begin (or end) of file.
returns a list of positions where each occurrence is found,
in the same order found reading the file.
Leaves unaltered the cursor position.
"""
curpos=self.currentPos()
list = []
line = self.findString(thestring,1,backward)
while line != '':
list.append(self.currentPos())
line = self.findString(thestring,1,backward)
self.toPos(curpos)
return list
# <<fold
def close(self): # fold>>
self.__file.close()
# <<fold
file_name = 'your_file_name'
file_open = open(file_name, 'r')
def handler(line_one, line_two):
print(line_one, line_two)
while file_open:
try:
one = file_open.next()
two = file_open.next()
handler(one, two)
except(StopIteration):
file_open.close()
break
My idea is to create a generator that reads two lines from the file at a time, and returns this as a 2-tuple, This means you can then iterate over the results.
from cStringIO import StringIO
def read_2_lines(src):
while True:
line1 = src.readline()
if not line1: break
line2 = src.readline()
if not line2: break
yield (line1, line2)
data = StringIO("line1nline2nline3nline4n")
for read in read_2_lines(data):
print read
If you have an odd number of lines, it won’t work perfectly, but this should give you a good outline.
This Python code will print the first two lines:
import linecache
filename = "ooxx.txt"
print(linecache.getline(filename,2))
import itertools
with open('a') as f:
for line1,line2 in itertools.zip_longest(*[f]*2):
print(line1,line2)
itertools.zip_longest()
returns an iterator, so it’ll work well even if the file is billions of lines long.
If there are an odd number of lines, then line2
is set to None
on the last iteration.
On Python2 you need to use izip_longest
instead.
In the comments, it has been asked if this solution reads the whole file first, and then iterates over the file a second time.
I believe that it does not. The with open('a') as f
line opens a file handle, but does not read the file. f
is an iterator, so its contents are not read until requested. zip_longest
takes iterators as arguments, and returns an iterator.
zip_longest
is indeed fed the same iterator, f, twice. But what ends up happening is that next(f)
is called on the first argument and then on the second argument. Since next()
is being called on the same underlying iterator, successive lines are yielded. This is very different than reading in the whole file. Indeed the purpose of using iterators is precisely to avoid reading in the whole file.
I therefore believe the solution works as desired — the file is only read once by the for-loop.
To corroborate this, I ran the zip_longest solution versus a solution using f.readlines()
. I put a input()
at the end to pause the scripts, and ran ps axuw
on each:
% ps axuw | grep zip_longest_method.py
unutbu 11119 2.2 0.2
4520 2712 pts/0 S+ 21:14 0:00 python /home/unutbu/pybin/zip_longest_method.py bigfile
% ps axuw | grep readlines_method.py
unutbu 11317 6.5 8.8
93908 91680 pts/0 S+ 21:16 0:00 python /home/unutbu/pybin/readlines_method.py bigfile
The readlines
clearly reads in the whole file at once. Since the zip_longest_method
uses much less memory, I think it is safe to conclude it is not reading in the whole file at once.
def readnumlines(file, num=2):
f = iter(file)
while True:
lines = [None] * num
for i in range(num):
try:
lines[i] = f.next()
except StopIteration: # EOF or not enough lines available
return
yield lines
# use like this
f = open("thefile.txt", "r")
for line1, line2 in readnumlines(f):
# do something with line1 and line2
# or
for line1, line2, line3, ..., lineN in readnumlines(f, N):
# do something with N lines
I would proceed in a similar way as ghostdog74, only with the try outside and a few modifications:
try:
with open(filename) as f:
for line1 in f:
line2 = f.next()
# process line1 and line2 here
except StopIteration:
print "(End)" # do whatever you need to do with line1 alone
This keeps the code simple and yet robust. Using the with
closes the file if something else happens, or just closes the resources once you have exhausted it and exit the loop.
Note that with
needs 2.6, or 2.5 with the with_statement
feature enabled.
I have worked on a similar problem last month. I tried a while loop with f.readline() as well as f.readlines().
My data file is not huge, so I finally chose f.readlines(), which gives me more control of the index, otherwise
I have to use f.seek() to move back and forth the file pointer.
My case is more complicated than OP. Because my data file is more flexible on how many lines to be parsed each time, so
I have to check a few conditions before I can parse the data.
Another problem I found out about f.seek() is that it doesn’t handle utf-8 very well when I use codecs.open(”, ‘r’, ‘utf-8’), (not exactly sure about the culprit, eventually I gave up this approach.)
Simple little reader. It will pull lines in pairs of two and return them as a tuple as you iterate over the object. You can close it manually or it will close itself when it falls out of scope.
class doublereader:
def __init__(self,filename):
self.f = open(filename, 'r')
def __iter__(self):
return self
def next(self):
return self.f.next(), self.f.next()
def close(self):
if not self.f.closed:
self.f.close()
def __del__(self):
self.close()
#example usage one
r = doublereader(r"C:file.txt")
for a, h in r:
print "x:%sny:%s" % (a,h)
r.close()
#example usage two
for x,y in doublereader(r"C:file.txt"):
print "x:%sny:%s" % (x,y)
#closes itself as soon as the loop goes out of scope
how about this one, anybody seeing a problem with it
with open('file_name') as f:
for line1, line2 in zip(f, f):
print(line1, line2)
f = open(filename, "r")
for line in f:
line1 = line
f.next()
f.close
Right now, you can read file every two line. If you like you can also check the f status before f.next()
If the file is of reasonable size, another approach that uses list-comprehension to read the entire file into a list of 2-tuples, is this:
filaname = '/path/to/file/name'
with open(filename, 'r') as f:
list_of_2tuples = [ (line,f.readline()) for line in f ]
for (line1,line2) in list_of_2tuples: # Work with them in pairs.
print('%s :: %s', (line1,line2))
I am coding a python script that parses a text file. The format of this text file is such that each element in the file uses two lines and for convenience I would like to read both lines before parsing. Can this be done in Python?
I would like to some something like:
f = open(filename, "r")
for line in f:
line1 = line
line2 = f.readline()
f.close
But this breaks saying that:
ValueError: Mixing iteration and read methods would lose data
Related:
Similar question here. You can’t mix iteration and readline so you need to use one or the other.
while True:
line1 = f.readline()
line2 = f.readline()
if not line2: break # EOF
...
use next()
, eg
with open("file") as f:
for line in f:
print(line)
nextline = next(f)
print("next line", nextline)
....
Works for even and odd-length files. It just ignores the unmatched last line.
f=file("file")
lines = f.readlines()
for even, odd in zip(lines[0::2], lines[1::2]):
print "even : ", even
print "odd : ", odd
print "end cycle"
f.close()
If you have large files, this is not the correct approach. You are loading all the file in memory with readlines(). I once wrote a class that read the file saving the fseek position of each start of line. This allows you to get specific lines without having all the file in memory, and you can also go forward and backward.
I paste it here. License is Public domain, meaning, do what you want with it. Please note that this class has been written 6 years ago and I haven’t touched or checked it since. I think it’s not even file compliant. Caveat emptor. Also, note that this is overkill for your problem. I’m not claiming you should definitely go this way, but I had this code and I enjoy sharing it if you need more complex access.
import string
import re
class FileReader:
"""
Similar to file class, but allows to access smoothly the lines
as when using readlines(), with no memory payload, going back and forth,
finding regexps and so on.
"""
def __init__(self,filename): # fold>>
self.__file=file(filename,"r")
self.__currentPos=-1
# get file length
self.__file.seek(0,0)
counter=0
line=self.__file.readline()
while line != '':
counter = counter + 1
line=self.__file.readline()
self.__length = counter
# collect an index of filedescriptor positions against
# the line number, to enhance search
self.__file.seek(0,0)
self.__lineToFseek = []
while True:
cur=self.__file.tell()
line=self.__file.readline()
# if it's not null the cur is valid for
# identifying a line, so store
self.__lineToFseek.append(cur)
if line == '':
break
# <<fold
def __len__(self): # fold>>
"""
member function for the operator len()
returns the file length
FIXME: better get it once when opening file
"""
return self.__length
# <<fold
def __getitem__(self,key): # fold>>
"""
gives the "key" line. The syntax is
import FileReader
f=FileReader.FileReader("a_file")
line=f[2]
to get the second line from the file. The internal
pointer is set to the key line
"""
mylen = self.__len__()
if key < 0:
self.__currentPos = -1
return ''
elif key > mylen:
self.__currentPos = mylen
return ''
self.__file.seek(self.__lineToFseek[key],0)
counter=0
line = self.__file.readline()
self.__currentPos = key
return line
# <<fold
def next(self): # fold>>
if self.isAtEOF():
raise StopIteration
return self.readline()
# <<fold
def __iter__(self): # fold>>
return self
# <<fold
def readline(self): # fold>>
"""
read a line forward from the current cursor position.
returns the line or an empty string when at EOF
"""
return self.__getitem__(self.__currentPos+1)
# <<fold
def readbackline(self): # fold>>
"""
read a line backward from the current cursor position.
returns the line or an empty string when at Beginning of
file.
"""
return self.__getitem__(self.__currentPos-1)
# <<fold
def currentLine(self): # fold>>
"""
gives the line at the current cursor position
"""
return self.__getitem__(self.__currentPos)
# <<fold
def currentPos(self): # fold>>
"""
return the current position (line) in the file
or -1 if the cursor is at the beginning of the file
or len(self) if it's at the end of file
"""
return self.__currentPos
# <<fold
def toBOF(self): # fold>>
"""
go to beginning of file
"""
self.__getitem__(-1)
# <<fold
def toEOF(self): # fold>>
"""
go to end of file
"""
self.__getitem__(self.__len__())
# <<fold
def toPos(self,key): # fold>>
"""
go to the specified line
"""
self.__getitem__(key)
# <<fold
def isAtEOF(self): # fold>>
return self.__currentPos == self.__len__()
# <<fold
def isAtBOF(self): # fold>>
return self.__currentPos == -1
# <<fold
def isAtPos(self,key): # fold>>
return self.__currentPos == key
# <<fold
def findString(self, thestring, count=1, backward=0): # fold>>
"""
find the count occurrence of the string str in the file
and return the line catched. The internal cursor is placed
at the same line.
backward is the searching flow.
For example, to search for the first occurrence of "hello
starting from the beginning of the file do:
import FileReader
f=FileReader.FileReader("a_file")
f.toBOF()
f.findString("hello",1,0)
To search the second occurrence string from the end of the
file in backward movement do:
f.toEOF()
f.findString("hello",2,1)
to search the first occurrence from a given (or current) position
say line 150, going forward in the file
f.toPos(150)
f.findString("hello",1,0)
return the string where the occurrence is found, or an empty string
if nothing is found. The internal counter is placed at the corresponding
line number, if the string was found. In other case, it's set at BOF
if the search was backward, and at EOF if the search was forward.
NB: the current line is never evaluated. This is a feature, since
we can so traverse occurrences with a
line=f.findString("hello")
while line == '':
line.findString("hello")
instead of playing with a readline every time to skip the current
line.
"""
internalcounter=1
if count < 1:
count = 1
while 1:
if backward == 0:
line=self.readline()
else:
line=self.readbackline()
if line == '':
return ''
if string.find(line,thestring) != -1 :
if count == internalcounter:
return line
else:
internalcounter = internalcounter + 1
# <<fold
def findRegexp(self, theregexp, count=1, backward=0): # fold>>
"""
find the count occurrence of the regexp in the file
and return the line catched. The internal cursor is placed
at the same line.
backward is the searching flow.
You need to pass a regexp string as theregexp.
returns a tuple. The fist element is the matched line. The subsequent elements
contains the matched groups, if any.
If no match returns None
"""
rx=re.compile(theregexp)
internalcounter=1
if count < 1:
count = 1
while 1:
if backward == 0:
line=self.readline()
else:
line=self.readbackline()
if line == '':
return None
m=rx.search(line)
if m != None :
if count == internalcounter:
return (line,)+m.groups()
else:
internalcounter = internalcounter + 1
# <<fold
def skipLines(self,key): # fold>>
"""
skip a given number of lines. Key can be negative to skip
backward. Return the last line read.
Please note that skipLines(1) is equivalent to readline()
skipLines(-1) is equivalent to readbackline() and skipLines(0)
is equivalent to currentLine()
"""
return self.__getitem__(self.__currentPos+key)
# <<fold
def occurrences(self,thestring,backward=0): # fold>>
"""
count how many occurrences of str are found from the current
position (current line excluded... see skipLines()) to the
begin (or end) of file.
returns a list of positions where each occurrence is found,
in the same order found reading the file.
Leaves unaltered the cursor position.
"""
curpos=self.currentPos()
list = []
line = self.findString(thestring,1,backward)
while line != '':
list.append(self.currentPos())
line = self.findString(thestring,1,backward)
self.toPos(curpos)
return list
# <<fold
def close(self): # fold>>
self.__file.close()
# <<fold
file_name = 'your_file_name' file_open = open(file_name, 'r') def handler(line_one, line_two): print(line_one, line_two) while file_open: try: one = file_open.next() two = file_open.next() handler(one, two) except(StopIteration): file_open.close() break
My idea is to create a generator that reads two lines from the file at a time, and returns this as a 2-tuple, This means you can then iterate over the results.
from cStringIO import StringIO
def read_2_lines(src):
while True:
line1 = src.readline()
if not line1: break
line2 = src.readline()
if not line2: break
yield (line1, line2)
data = StringIO("line1nline2nline3nline4n")
for read in read_2_lines(data):
print read
If you have an odd number of lines, it won’t work perfectly, but this should give you a good outline.
This Python code will print the first two lines:
import linecache
filename = "ooxx.txt"
print(linecache.getline(filename,2))
import itertools
with open('a') as f:
for line1,line2 in itertools.zip_longest(*[f]*2):
print(line1,line2)
itertools.zip_longest()
returns an iterator, so it’ll work well even if the file is billions of lines long.
If there are an odd number of lines, then line2
is set to None
on the last iteration.
On Python2 you need to use izip_longest
instead.
In the comments, it has been asked if this solution reads the whole file first, and then iterates over the file a second time.
I believe that it does not. The with open('a') as f
line opens a file handle, but does not read the file. f
is an iterator, so its contents are not read until requested. zip_longest
takes iterators as arguments, and returns an iterator.
zip_longest
is indeed fed the same iterator, f, twice. But what ends up happening is that next(f)
is called on the first argument and then on the second argument. Since next()
is being called on the same underlying iterator, successive lines are yielded. This is very different than reading in the whole file. Indeed the purpose of using iterators is precisely to avoid reading in the whole file.
I therefore believe the solution works as desired — the file is only read once by the for-loop.
To corroborate this, I ran the zip_longest solution versus a solution using f.readlines()
. I put a input()
at the end to pause the scripts, and ran ps axuw
on each:
% ps axuw | grep zip_longest_method.py
unutbu 11119 2.2 0.2
4520 2712 pts/0 S+ 21:14 0:00 python /home/unutbu/pybin/zip_longest_method.py bigfile
% ps axuw | grep readlines_method.py
unutbu 11317 6.5 8.8
93908 91680 pts/0 S+ 21:16 0:00 python /home/unutbu/pybin/readlines_method.py bigfile
The readlines
clearly reads in the whole file at once. Since the zip_longest_method
uses much less memory, I think it is safe to conclude it is not reading in the whole file at once.
def readnumlines(file, num=2):
f = iter(file)
while True:
lines = [None] * num
for i in range(num):
try:
lines[i] = f.next()
except StopIteration: # EOF or not enough lines available
return
yield lines
# use like this
f = open("thefile.txt", "r")
for line1, line2 in readnumlines(f):
# do something with line1 and line2
# or
for line1, line2, line3, ..., lineN in readnumlines(f, N):
# do something with N lines
I would proceed in a similar way as ghostdog74, only with the try outside and a few modifications:
try:
with open(filename) as f:
for line1 in f:
line2 = f.next()
# process line1 and line2 here
except StopIteration:
print "(End)" # do whatever you need to do with line1 alone
This keeps the code simple and yet robust. Using the with
closes the file if something else happens, or just closes the resources once you have exhausted it and exit the loop.
Note that with
needs 2.6, or 2.5 with the with_statement
feature enabled.
I have worked on a similar problem last month. I tried a while loop with f.readline() as well as f.readlines().
My data file is not huge, so I finally chose f.readlines(), which gives me more control of the index, otherwise
I have to use f.seek() to move back and forth the file pointer.
My case is more complicated than OP. Because my data file is more flexible on how many lines to be parsed each time, so
I have to check a few conditions before I can parse the data.
Another problem I found out about f.seek() is that it doesn’t handle utf-8 very well when I use codecs.open(”, ‘r’, ‘utf-8’), (not exactly sure about the culprit, eventually I gave up this approach.)
Simple little reader. It will pull lines in pairs of two and return them as a tuple as you iterate over the object. You can close it manually or it will close itself when it falls out of scope.
class doublereader:
def __init__(self,filename):
self.f = open(filename, 'r')
def __iter__(self):
return self
def next(self):
return self.f.next(), self.f.next()
def close(self):
if not self.f.closed:
self.f.close()
def __del__(self):
self.close()
#example usage one
r = doublereader(r"C:file.txt")
for a, h in r:
print "x:%sny:%s" % (a,h)
r.close()
#example usage two
for x,y in doublereader(r"C:file.txt"):
print "x:%sny:%s" % (x,y)
#closes itself as soon as the loop goes out of scope
how about this one, anybody seeing a problem with it
with open('file_name') as f:
for line1, line2 in zip(f, f):
print(line1, line2)
f = open(filename, "r")
for line in f:
line1 = line
f.next()
f.close
Right now, you can read file every two line. If you like you can also check the f status before f.next()
If the file is of reasonable size, another approach that uses list-comprehension to read the entire file into a list of 2-tuples, is this:
filaname = '/path/to/file/name'
with open(filename, 'r') as f:
list_of_2tuples = [ (line,f.readline()) for line in f ]
for (line1,line2) in list_of_2tuples: # Work with them in pairs.
print('%s :: %s', (line1,line2))