Compare two files lines with python

Question:

This might sound a little bit stupid but I have been having a hard time figuring it out. I have two text files and all I want to do is to compare each line of the first file with all of the lines of the second file. So far I just wanted to test a small part of my code which is:

for line1 in file1:
    print line1
    for line2 in file2:
        print line2

I thought this small code would give me a line from first file followed by all the lines from the second file. But the way it works is totally different. It gives me this:

in file 1 line 1
in file 2 line 1
in file 2 line 2
in file 2 line 3
in file 1 line 2

What I expect to see:

in file 1 line 1
in file 2 line 1
in file 2 line 2
in file 2 line 3

in file 1 line 2
in file 2 line 1
in file 2 line 2
in file 2 line 3

Any idea of what I might be doing wrong here?

PLEASE NOTE: I don’t want to just compare the whole lines with each other to check if they are the same or not, I need to do some string operations before so the zip and stuff like that won’t help me. Thanks

Thanks in advance

Asked By: ahajib

||

Answers:

zip may be your friend here.

For example,

for line_a, line_b in zip(file_1, file_2):
  #do something with your strings

Sample terminal code:

>>> file_1 = ['a', 'b', 'c', 'd']
>>> file_2 = ['a', 'one', 'c', 'd', 'e']
>>> for a, b in zip(file_1, file_2):
...   if a == b:
...     print('equal!')
...   else:
...     print('nope!')
... 
equal!
nope!
equal!
equal!
>>> for a, b in zip(file_2, file_1):
...   print(a, b)
... 
a a
one b
c c
d d

Notice anything strange?

As per the Python Docs “zip() should only be used with unequal length inputs when you don’t care about trailing, unmatched values from the longer iterables. If those values are important, use itertools.zip_longest() instead.”

Answered By: Douglas Denhartog

What has happened here is that a file is an iterator, and you have exhausted it (run out). You can see that by trying to loop over the same file twice:

>>> f2=open("CLI.md")
>>> for i in f2:
...     print(i)
... 
The CLI
(file contents...)
>>> for i in f2:
...     print(i)
... 
>>>

The best way of handling that here is to first convert the file in the inner loop to a list before looping:

file2_lines = list(file2)
for line1 in file1:
    print line1
    for line2 in file2_lines:
        print line2

Also see: exhausted iterators – what to do about them?

Answered By: matsjoyce
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.