Psycopg2 copy_from() is inserting data with double quotes when whitespace is present in csv file
Question:
I am trying to import the following table to my Postgres server using cursor.copy_from() in psycopg2 because the file is too large.
id
mail
name
1
[email protected]
John Stokes
2
[email protected]
Emily Ray
Here is my code:
import psycopg2
import os
conn = psycopg2.connect(
dbname = name,
user = username,
password = pwd,
host = hst,
port = 5432
)
cur = conn.cursor()
path = os.getcwd() + 'users.csv'
file = open(path, 'r')
cur.copy_from(file, table_name, sep=',')
conn.commit()
conn.close()
This inserts the data to the table but there is double quotes in the third column like below.
id
mail
name
1
[email protected]
"John Stokes"
2
[email protected]
"Emily Ray"
Later I found out that the problem lies in the open() itself. Because if I print the first line by doing file.readline()
, I get:
1,[email protected],"John Stokes"
I don’t want these double quotes in my table. I tried using cursor.execute() with COPY FROM
query but it says that I am not a superuser even if I am.
Answers:
Use copy_expert. Then you are not working as the server user but as the client user. Also you can use WITH CSV
which will take care of the quoting. copy_from
and copy_to
work using the text format as described here COPY.
cat test.csv
1,[email protected],"John Stokes"
2,[email protected],"Emily Ray"
create table test_csv (id integer, mail varchar, name varchar);
import psycopg2
con = psycopg2.connect(dbname="test", host='localhost', user='postgres', port=5432)
cur = con.cursor()
with open('test.csv') as f:
cur.copy_expert('COPY test_csv FROM stdin WITH CSV', f)
con.commit()
select * from test_csv ;
id | mail | name
----+--------------------+-------------
1 | [email protected] | John Stokes
2 | [email protected] | Emily Ray
FYI, in psycopg3(psycopg)
this behavior has changed substantially. See here psycopg3 COPY for how to handle in that case.
UPDATE
Using psycopg3
the answer for Python 3.8+ where the walrus operator is available would be:
import psycopg
with open('test.csv') as f:
with cur.copy("COPY test_csv FROM STDIN WITH CSV") as copy:
while data := f.read(1000):
copy.write(data)
con.commit()
Or using Python 3.7-, something like:
# Function copied from here https://www.iditect.com/guide/python/python_howto_read_big_file_in_chunks.html
def read_in_chunks(file, chunk_size=1024*10): # Default chunk size: 10k.
while True:
chunk = file.read(chunk_size)
if chunk:
yield chunk
else: # The chunk was empty, which means we're at the end of the file
return
with open('test.csv') as f:
with cur.copy("COPY test_csv FROM STDIN WITH CSV") as copy:
for chunk in read_in_chunks(f):
copy.write(chunk)
con.commit()
I am trying to import the following table to my Postgres server using cursor.copy_from() in psycopg2 because the file is too large.
id | name | |
---|---|---|
1 | [email protected] | John Stokes |
2 | [email protected] | Emily Ray |
Here is my code:
import psycopg2
import os
conn = psycopg2.connect(
dbname = name,
user = username,
password = pwd,
host = hst,
port = 5432
)
cur = conn.cursor()
path = os.getcwd() + 'users.csv'
file = open(path, 'r')
cur.copy_from(file, table_name, sep=',')
conn.commit()
conn.close()
This inserts the data to the table but there is double quotes in the third column like below.
id | name | |
---|---|---|
1 | [email protected] | "John Stokes" |
2 | [email protected] | "Emily Ray" |
Later I found out that the problem lies in the open() itself. Because if I print the first line by doing file.readline()
, I get:
1,[email protected],"John Stokes"
I don’t want these double quotes in my table. I tried using cursor.execute() with COPY FROM
query but it says that I am not a superuser even if I am.
Use copy_expert. Then you are not working as the server user but as the client user. Also you can use WITH CSV
which will take care of the quoting. copy_from
and copy_to
work using the text format as described here COPY.
cat test.csv
1,[email protected],"John Stokes"
2,[email protected],"Emily Ray"
create table test_csv (id integer, mail varchar, name varchar);
import psycopg2
con = psycopg2.connect(dbname="test", host='localhost', user='postgres', port=5432)
cur = con.cursor()
with open('test.csv') as f:
cur.copy_expert('COPY test_csv FROM stdin WITH CSV', f)
con.commit()
select * from test_csv ;
id | mail | name
----+--------------------+-------------
1 | [email protected] | John Stokes
2 | [email protected] | Emily Ray
FYI, in psycopg3(psycopg)
this behavior has changed substantially. See here psycopg3 COPY for how to handle in that case.
UPDATE
Using psycopg3
the answer for Python 3.8+ where the walrus operator is available would be:
import psycopg
with open('test.csv') as f:
with cur.copy("COPY test_csv FROM STDIN WITH CSV") as copy:
while data := f.read(1000):
copy.write(data)
con.commit()
Or using Python 3.7-, something like:
# Function copied from here https://www.iditect.com/guide/python/python_howto_read_big_file_in_chunks.html
def read_in_chunks(file, chunk_size=1024*10): # Default chunk size: 10k.
while True:
chunk = file.read(chunk_size)
if chunk:
yield chunk
else: # The chunk was empty, which means we're at the end of the file
return
with open('test.csv') as f:
with cur.copy("COPY test_csv FROM STDIN WITH CSV") as copy:
for chunk in read_in_chunks(f):
copy.write(chunk)
con.commit()