Using a buffer to write a psycopg3 copy result through pandas
Question:
Using psycopg2
, I could write large results as CSV using copy_expert
and a BytesIO
buffer like this with pandas
:
copy_sql = "COPY (SELECT * FROM big_table) TO STDOUT CSV"
buffer = BytesIO()
cursor.copy_expert(copy_sql, buffer, size=8192)
buffer.seek(0)
pd.read_csv(buffer, engine="c").to_excel(self.output_file)
However, I can’t figure out how to replace the buffer
in copy_expert
with psycopg3
‘s new copy command. Has anyone figured out a way to do this?
Answers:
Data and table setup:
cat test.csv
1,[email protected],"John Stokes"
2,[email protected],"Emily Ray"
create table test_csv (id integer, mail varchar, name varchar);
import psycopg
with open('test.csv') as f:
with cur.copy("COPY test_csv FROM STDIN WITH CSV") as copy:
while data := f.read(1000):
copy.write(data)
con.commit()
Using a buffer:
buffer = BytesIO()
with cur.copy('copy test_csv to stdout') as copy:
for data in copy:
buffer.write(data)
buffer.seek(0)
pd.read_csv(buffer, engine="c").to_excel('test_csv.xlsx')
Since you are using Pandas there is also:
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:@localhost:5432/test')
pd.read_sql('select * from test_csv', engine).to_excel('test_csv.xlsx')
The key to writing a large query to a file through psycopg3
in this fashion is to use a SpooledTemporaryFile
, which will limit the amount of memory usage in Python (see max_size
). Then after the CSV is written to disk, convert with pandas
.
from tempfile import SpooledTemporaryFile
from pandas import read_csv
from psycopg import connect
cursor = connect([connection]).cursor()
copy_sql = "COPY (SELECT * FROM stocks WHERE price > %s) TO STDOUT"
price = 100
with SpooledTemporaryFile(
mode="wb",
max_size=65546,
buffering=8192,
) as tmpfile:
with cursor.copy(copy_sql, (price,)) as copy:
for data in copy:
tmpfile.write(data)
tmpfile.seek(0)
read_csv(tmpfile, engine="c").to_excel("my_spreadsheet.xlsx")
Using psycopg2
, I could write large results as CSV using copy_expert
and a BytesIO
buffer like this with pandas
:
copy_sql = "COPY (SELECT * FROM big_table) TO STDOUT CSV"
buffer = BytesIO()
cursor.copy_expert(copy_sql, buffer, size=8192)
buffer.seek(0)
pd.read_csv(buffer, engine="c").to_excel(self.output_file)
However, I can’t figure out how to replace the buffer
in copy_expert
with psycopg3
‘s new copy command. Has anyone figured out a way to do this?
Data and table setup:
cat test.csv
1,[email protected],"John Stokes"
2,[email protected],"Emily Ray"
create table test_csv (id integer, mail varchar, name varchar);
import psycopg
with open('test.csv') as f:
with cur.copy("COPY test_csv FROM STDIN WITH CSV") as copy:
while data := f.read(1000):
copy.write(data)
con.commit()
Using a buffer:
buffer = BytesIO()
with cur.copy('copy test_csv to stdout') as copy:
for data in copy:
buffer.write(data)
buffer.seek(0)
pd.read_csv(buffer, engine="c").to_excel('test_csv.xlsx')
Since you are using Pandas there is also:
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:@localhost:5432/test')
pd.read_sql('select * from test_csv', engine).to_excel('test_csv.xlsx')
The key to writing a large query to a file through psycopg3
in this fashion is to use a SpooledTemporaryFile
, which will limit the amount of memory usage in Python (see max_size
). Then after the CSV is written to disk, convert with pandas
.
from tempfile import SpooledTemporaryFile
from pandas import read_csv
from psycopg import connect
cursor = connect([connection]).cursor()
copy_sql = "COPY (SELECT * FROM stocks WHERE price > %s) TO STDOUT"
price = 100
with SpooledTemporaryFile(
mode="wb",
max_size=65546,
buffering=8192,
) as tmpfile:
with cursor.copy(copy_sql, (price,)) as copy:
for data in copy:
tmpfile.write(data)
tmpfile.seek(0)
read_csv(tmpfile, engine="c").to_excel("my_spreadsheet.xlsx")