Merge two CSV files based on a data from the first column

Question:

I have two csv files like below that I’d like to merge – more or less using the first column ID_ as the unique identifier, and append the AMT column to a new row in the final file.

CSV1

ID_ CUSTOMER_ID_    EMAIL_ADDRESS_
1090    1   [email protected]
1106    2   [email protected]
1145    3   [email protected]
1206    4   [email protected]
1247    5   [email protected]
1254    6   [email protected]
1260    7   [email protected]
1361    8   [email protected]
1376    9   [email protected]

CSV2


ID_ AMT
1090    5
1106    5
1145    5
1206    5
1247    5
1254    65
1260    5
1361    10
1376    5

Here’s what I’m looking for in a final file:

ID_ CUSTOMER_ID_    EMAIL_ADDRESS_  AMT
1090    1   [email protected]    5
1106    2   [email protected]    5
1145    3   [email protected]    5
1206    4   [email protected]    5
1247    5   [email protected]    5
1254    6   [email protected]    65
1260    7   [email protected]    5
1361    8   [email protected]    10
1376    9   [email protected]    5

I’ve tried modifying a this below as much as possible, but not able to get what I’m looking for. Really stuck on this – not sure what else I can do. Really appreciate any and all help!

join -t, File1.csv File2.csv

Data shows in this example contains tabs, but my actual files are CSVs as mentioned and will contain commas as a separator.

Asked By: Sean Donahue

||

Answers:

This can be easily done using Pandas library. Here is my code to do this:

'''
This program reads two csv files and merges them based on a common key column.
'''
# import the pandas library
# you can install using the following command: pip install pandas

import pandas as pd

# Read the files into two dataframes.
df1 = pd.read_csv('CSV1.csv')
df2 = pd.read_csv('CSV2.csv')

# Merge the two dataframes, using _ID column as key
df3 = pd.merge(df1, df2, on = 'ID_')
df3.set_index('ID_', inplace = True)

# Write it to a new CSV file
df3.to_csv('CSV3.csv')

You can find a short tutorial on pandas here:
https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html

Answered By: Iman
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.