Python CSVkit compare CSV files

Question

I have two CSV files that look like this..

CSV 1

reference  |  name  |  house
----------------------------
2348A      |  john  |  37
5648R      |  bill  |  3
RT48       |  kate  |  88
76A        |  harry |  433

CSV2

reference
---------
2348A
76A

Using Python and CSVkit I am trying to create an output CSV of the rows in CSV1 by comparing it to CSV2. Does anybody have an example they can point me in the direction of?

Asked By: fightstarr20

||

Source

Answer 1

I would recommended to use pandas to achieve what you are looking for:

And here is how simple it would be using pandas, consider your two csv files are like this:

CSV1

reference,name,house
2348A,john,37
5648R,bill,3
RT48,kate,88
76A,harry ,433

CSV2

reference
2348A
76A

Code

import pandas as pd
df1 = pd.read_csv(r'd:tempdata1.csv')
df2 = pd.read_csv(r'd:tempdata2.csv')
df3 = pd.merge(df1,df2, on= 'reference', how='inner')
df3.to_csv('outpt.csv')

output.csv

,reference,name,house
0,2348A,john,37
1,76A,harry ,433

Answered By: Abbas

Answer 2

I’d recommend using a tool like;

csvjoin from csvkit or
Pandas

Solution 1:

pip install csvkit
$ csvjoin --help
usage: csvjoin [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
               [-p ESCAPECHAR] [-z MAXFIELDSIZE] [-e ENCODING] [-S] [-v] [-l]
               [--zero] [-c COLUMNS] [--outer] [--left] [--right]
               [FILE [FILE ...]]

Example:
Left Join on column of [reference]

csvjoin --columns "reference" --left CSV1.csv CSV2.csv

Solution 2:

You can also use the pandas library which is easier to manipulate Dataframe and filter by column.

import pandas as pd

df1 = pd.read_csv('CSV1.csv')
df2 = pd.read_csv('CSV2.csv')

df = df1[df1['reference'].isin(df2['reference'])]
df.to_csv('output.csv', index=False)

Answered By: H.Elci

Python CSVkit compare CSV files

Question:

Answers: