Batch copying TSVs to postgres with copy command

Question:

I wrote this script that uploads the contents of a folder of TSVs to my Postgres database.

It works, but it reads the files line by line, which takes a long time.

Is there a way to modify this so it runs the COPY command instead of an INSERT command?

I left my previous attempt at a COPY in the code below (but commented out). The problem with that code is that it copied the file headers into the rows of my Postgres table.

def main():

# MAKE SURE THIS IS THE RIGHT FILE TYPE
for file in pathlib.Path().rglob('*.tsv'):
    print(os.path.abspath(file))

    # MAKE SURE THIS IS THE RIGHT TABLE
    cur.execute(create_table_agent)


    with open(file,'r') as file_in:
        reader = csv.reader(file_in, delimiter='t')
        next(reader)
        for row in reader:
            print(row)
            cur.execute("INSERT INTO mls_agent_1_line VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", row)
        # cur.copy_from(file_in, 'mls_appraisal_world', sep='t', null='\N')
    conn.commit()

conn.close()

if __name__ == '__main__':
    main()
Asked By: reallymemorable

||

Answers:

The Postgres COPY command can properly skip headers only in CSV format. Per the documentation:

HEADER

Specifies that the file contains a header line with the names of each column in the file. On output, the first line contains the column names from the table, and on input, the first line is ignored. This option is allowed only when using CSV format.

If your files can be properly imported by the COPY command with format csv option, use the function copy_expert(sql, file, size=8192):

with open(file, 'r') as file_in:
    cur.copy_expert("copy table_name from stdin with csv header delimiter E't'", file_in)
Answered By: klin
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.