Python CSVkit compare CSV files

Question:

I have two CSV files that look like this..

CSV 1

reference  |  name  |  house
----------------------------
2348A      |  john  |  37
5648R      |  bill  |  3
RT48       |  kate  |  88
76A        |  harry |  433

CSV2

reference
---------
2348A
76A

Using Python and CSVkit I am trying to create an output CSV of the rows in CSV1 by comparing it to CSV2. Does anybody have an example they can point me in the direction of?

Asked By: fightstarr20

||

Answers:

I would recommended to use pandas to achieve what you are looking for:

And here is how simple it would be using pandas, consider your two csv files are like this:

CSV1

reference,name,house
2348A,john,37
5648R,bill,3
RT48,kate,88
76A,harry ,433

CSV2

reference
2348A
76A

Code

import pandas as pd
df1 = pd.read_csv(r'd:tempdata1.csv')
df2 = pd.read_csv(r'd:tempdata2.csv')
df3 = pd.merge(df1,df2, on= 'reference', how='inner')
df3.to_csv('outpt.csv')

output.csv

,reference,name,house
0,2348A,john,37
1,76A,harry ,433
Answered By: Abbas

I’d recommend using a tool like;

  • csvjoin from csvkit or
  • Pandas

Solution 1:

pip install csvkit
$ csvjoin --help
usage: csvjoin [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
               [-p ESCAPECHAR] [-z MAXFIELDSIZE] [-e ENCODING] [-S] [-v] [-l]
               [--zero] [-c COLUMNS] [--outer] [--left] [--right]
               [FILE [FILE ...]]

Example:
Left Join on column of [reference]

csvjoin --columns "reference" --left CSV1.csv CSV2.csv

Solution 2:

You can also use the pandas library which is easier to manipulate Dataframe and filter by column.

import pandas as pd

df1 = pd.read_csv('CSV1.csv')
df2 = pd.read_csv('CSV2.csv')

df = df1[df1['reference'].isin(df2['reference'])]
df.to_csv('output.csv', index=False)
Answered By: H.Elci
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.