Merge multiple csv files into one

Question

I have roughly 20 csv files (all with headers) that I would like to merge all of them into 1 csv file.

Looking online, one way I found was to use the terminal command:

cat *.csv > file.csv

This worked just fine, but the problem is, as all the csv file comes with the headers, those also get placed into the csv file.

Is there a terminal command or python script on which I can merge all those csv files into one and keep only one header?

Thank you so much

Asked By: Nayden Van

||

Source

Answer 1

You can do this with awk:

awk '(NR == 1) || (FNR > 1)' *.csv > file.csv

FNR refers to the record number (typically the line number) in the current file and NR refers to the total record number. So the first line of the first file is accepted and the first lines of the subsequent files are ignored.

This does assume that all your csv files have the same number of columns in the same order.

Answered By: Alex

Answer 2

This command should work for you:

tail -qn +2 *.csv > file.csv

Although, do note, you need to have an extra empty line at the end of each file, otherwise the files will concat in the same line 1, 12, 2 instead of 1, 1 in row 1 and 2, 2 in row 2.

Answered By: veedata

Answer 3

My vote goes to the Awk solution, but since this question explicitly asks about Python, here is a solution for that.

import csv
import sys


writer = csv.writer(sys.stdout)

firstfile = True
for file in sys.argv[1:]:
    with open(file, 'r') as rawfile:
        reader = csv.reader(rawfile)
        for idx, row in enumerate(reader):
            # enumerate() is zero-based by default; 0 is first line
            if idx == 0 and not firstfile:
                continue
            writer.writerow(row)
    firstfile = False

Usage: python script.py first.csv second.csv etc.csv >final.csv

This simple script doesn’t really benefit from any Python features, but if you need to count the number of fields in non-trivial CSV files (i.e. with quoted fields which might contain a comma which isn’t a separator) that’s hard in Awk, and trivial in Python (because the csv library already knows exactly how to handle that).

Answered By: tripleee

Answer 4

The below code was what worked for me.

import csv
from datetime import datetime
import glob

Time = datetime.now()
Time = Time.strftime("%Y%B%d""_""%H%M")

inputFiles = [] #[i for i in glob.glob('*.{}'.format(extension))]
for file in glob.glob("*.csv"):
    inputFiles.append(file)
print(inputFiles)

with open("combined" + Time + '.csv', 'xb') as csvfile:
    filewriter = csv.writer(csvfile, delimiter=',',quotechar='|', quoting=csv.QUOTE_MINIMAL)        
outputFile = "combined" + Time + '.csv'

for file in inputFiles:
    f = open(file, "r")  # set f as opening the given csv in the same file location
    reader = csv.reader(f)  # set reader as a readable copy of the csv
    rows = []  # set rows as an empty list
    for (
        row
    ) in (
        reader
    ):  # for every row in reader, try to append a new row in our rows list, and if now, pass
        try:
            with open(outputFile, "a", newline="") as g:
                # create a csv writer
                writer = csv.writer(g)
                # write the account number and the docket to the csv file
                writer.writerow(row)
        except:
            pass

Answered By: Groovy Elegant

Merge multiple csv files into one

Question:

Answers: