In Python why is my "for entry in csv_compare:" loop iterating only once and getting stuck on the last input

Question

I’m trying to compare 2 csv files and then put the common entries in a 3rd csv to write to file. For some reason it iterates the whole loop for row in csv_input but the entry in csv_compare loop iterates only once and stops on the last entry. I want to compare every row entry with every entry entry.

import csv
finalCSV = {}
with open('input.csv', newline='') as csvfile, open('compare.csv', newline='') as keyCSVFile, open('output.csv', 'w' ,newline='') as OutputCSV:
    csv_input = csv.reader(csvfile)
    csv_compare = csv.reader(keyCSVFile)
    csv_output = csv.writer(OutputCSV)
    csv_output.writerow(next(csv_input))

    for row in csv_input:
        for entry in csv_compare:
            print(row[0] + ' ' + entry[0])
            if row[0] == entry[0]:
                csv_output.writerow(row)
                break
    
print('wait...')

Asked By: Zac Borders

||

Source

Answer 1

I suggest to read the first column from csv_compare to list or a set and then use only single for-loop:

import csv

finalCSV = {}
with open("input.csv", newline="") as csvfile, open(
    "compare.csv", newline=""
) as keyCSVFile, open("output.csv", "w", newline="") as OutputCSV:
    csv_input = csv.reader(csvfile)
    csv_compare = csv.reader(keyCSVFile)
    csv_output = csv.writer(OutputCSV)
    csv_output.writerow(next(csv_input))

    compare = {entry[0] for entry in csv_compare}   # <--- read csv_compare to a set

    for row in csv_input:
        if row[0] in compare:     # <--- use `in` operator
            csv_output.writerow(row)

Answered By: Andrej Kesely

Answer 2

When you break the inner loop and start the next iteration of the outer loop, csv_compare doesn’t reset to the beginning. It picks up where you left off. Once you have exhausted the iterator, that’s it.

You would need to reset the iterator at the top of each iteration of the outer loop, which is most easily done by simply opening the file there.

with open('input.csv', newline='') as csvfile, open('output.csv', 'w' ,newline='') as OutputCSV:
    csv_input = csv.reader(csvfile)
    csv_output = csv.writer(OutputCSV)
    csv_output.writerow(next(csv_input))

    for row in csv_input:
        with  open('compare.csv', newline='') as keyCSVFile:
            csv_compare = csv.reader(keyCSVFile)
            for entry in csv_compare:
                if row[0] == entry[0]:
                    csv_output.writerow(row)
                    break

Answered By: chepner

Answer 3

You could skip the inner loop completely. You add rows from input.csv when the first column matches any of the first column values in compare.csv. So put those values in a set for easy lookup.

import csv

with open('compare.csv', newline='') as keyCSVFile:
    key_set = {row[0] for row in csv.reader(keyCSVFile)}

with open('input.csv', newline='') as csvfile, open('output.csv', 'w' ,newline='') as OutputCSV:
    csv_input = csv.reader(csvfile)
    csv_output = csv.writer(OutputCSV)
    csv_output.writerow(next(csv_input))
    csv_output.writerows(row for row in csv_input if row[0] in key_set)

del key_set
print('wait...')

Answered By: tdelaney

In Python why is my "for entry in csv_compare:" loop iterating only once and getting stuck on the last input

Question:

Answers: