How to compare 2 different csv files and output the differences

Question:

I have 2 CSVs which are New.csv and Old.csv that are have around 1K rows and 10 columns that has a structure like this:

enter image description here

If there is a longName (first column) in in the new.csv that is not in the old.csv, I would like that entire new.csv row to be appended to the changes.csv.

I started off by doing this but it does not work well at all:

def deltaFileMaker():
    with open('Old.csv', 'r', encoding='utf-8') as t1, open('New.csv', 'r', encoding='utf-8') as t2:
        fileone = t1.readlines()
        filetwo = t2.readlines()

    with open('changes.csv', 'w', encoding='utf-8') as outFile:
        for line in filetwo:
            if line not in fileone:
                outFile.write(line)



deltaFileMaker()

I also tried to use csv-diff but I could not find a way to convert its output to a csv file

Update

def deltaFileMaker():
    from csv_diff import load_csv, compare
    diff = compare(
        load_csv(open("old.csv",encoding="utf8"), key="longName"),
        load_csv(open("new.csv",encoding="utf8"), key="longName")
    )

        with open('changes.csv', 'w',encoding="utf8") as f:  
        w = csv.DictWriter(f, diff.keys())
        w.writeheader()
        w.writerow(diff)


deltaFileMaker()


Doing this:
enter image description here

Asked By: joe_sanders

||

Answers:

Have you looked at csv-diff? Their website has an example that might be suitable:

from csv_diff import load_csv, compare
diff = compare(
    load_csv(open("one.csv"), key="id"),
    load_csv(open("two.csv"), key="id")
)

This should return a dict object, which you can parse into a CSV file. To parse that dict into rows, this is an example. Note: getting the changes to write correctly is difficult, but this is more of a proof-of-concept – modify as you wish

from csv_diff import load_csv, compare
from csv import DictWriter

# Get all the row headers across all the changes
headers = set({'change type'})
for key, vals in diff.items():
    for val in vals: # Multiple of the same difference 'type'
        headers = headers.union(set(val.keys()))

# Write changes to file
with open('changes.csv', 'w', encoding='utf-8') as fh:
    w = DictWriter(fh, headers)
    w.writeheader()
    for key, changes in diff.items():
        for val in changes: # Add each instance of this type of change
            val.update({'change type': key}) # Add 'change type' data
            w.writerow(val)

For the file one.csv:

id,     name, age
 1,     Cleo,   4
 2, Pancakes,   2

and two.csv:

id,   name, age
 1,   Cleo,   5
 3, Bailey,   1
4,  Elliot,  10

Running this produces:

change type,     name, id,               changes, age, key
      added,   Bailey,  3,                      ,   1,
      added,   Elliot,  4,                      ,  10,
    removed, Pancakes,  2,                      ,   2,
    changed,         ,   , "{'age': ['4', '5']}",    ,   1

So not great for all changes, but works really well for the added/removed rows.

Answered By: Collin Heist
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.