Psycopg2 copy_from() is inserting data with double quotes when whitespace is present in csv file

Question:

I am trying to import the following table to my Postgres server using cursor.copy_from() in psycopg2 because the file is too large.

id mail name
1 [email protected] John Stokes
2 [email protected] Emily Ray

Here is my code:

import psycopg2
import os

conn = psycopg2.connect(
    dbname = name,
    user = username,
    password = pwd,
    host = hst,
    port = 5432
)

cur = conn.cursor()

path = os.getcwd() + 'users.csv'

file = open(path, 'r')

cur.copy_from(file, table_name, sep=',')
conn.commit()

conn.close()

This inserts the data to the table but there is double quotes in the third column like below.

id mail name
1 [email protected] "John Stokes"
2 [email protected] "Emily Ray"

Later I found out that the problem lies in the open() itself. Because if I print the first line by doing file.readline(), I get:

1,[email protected],"John Stokes"

I don’t want these double quotes in my table. I tried using cursor.execute() with COPY FROM query but it says that I am not a superuser even if I am.

Asked By: noob_coder

||

Answers:

Use copy_expert. Then you are not working as the server user but as the client user. Also you can use WITH CSV which will take care of the quoting. copy_from and copy_to work using the text format as described here COPY.

cat test.csv                                                                                                                                                            
1,[email protected],"John Stokes"
2,[email protected],"Emily Ray"

create table test_csv (id integer, mail varchar, name varchar);

import psycopg2

con = psycopg2.connect(dbname="test", host='localhost', user='postgres', port=5432)
cur = con.cursor()


with open('test.csv') as f:
    cur.copy_expert('COPY test_csv FROM stdin WITH CSV', f)

con.commit()

select * from test_csv ;
 id |        mail        |    name     
----+--------------------+-------------
  1 | [email protected]  | John Stokes
  2 | [email protected] | Emily Ray

FYI, in psycopg3(psycopg) this behavior has changed substantially. See here psycopg3 COPY for how to handle in that case.

UPDATE

Using psycopg3 the answer for Python 3.8+ where the walrus operator is available would be:

import psycopg

with open('test.csv') as f:
    with cur.copy("COPY test_csv FROM STDIN WITH CSV") as copy:
        while data := f.read(1000):
            copy.write(data)
con.commit()

Or using Python 3.7-, something like:

# Function copied from here https://www.iditect.com/guide/python/python_howto_read_big_file_in_chunks.html
def read_in_chunks(file, chunk_size=1024*10):  # Default chunk size: 10k.
    while True:
        chunk = file.read(chunk_size)
        if chunk:
            yield chunk
        else: # The chunk was empty, which means we're at the end of the file
            return

with open('test.csv') as f:
    with cur.copy("COPY test_csv FROM STDIN WITH CSV") as copy:
        for chunk in read_in_chunks(f):
            copy.write(chunk)

con.commit()

Answered By: Adrian Klaver
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.