Python Export table from postgres and import to another postgres using

Question:

I have 2 postgres databases with same schema but in 2 different schemas. I am writing a python script with a goal to export data partially from one of the tables and import the result to the same table but in a different database (like select from A where f=123). The schema is large (it has many columns of different types, some are allowed to be null, some aren’t. There are date types and string fields which can contain sentences, pseudo-queries and file names) and there can be thousands of rows in a table.

I took approach of exporting the data from table to a csv file, then importing the data from a csv file to a second database table.

I use psycopg2 lib for working with Postgres in Python along with a csv lib to read and write csv files.

I implemented the first version. The problem was that: Some columns in a row are empty, when I read the table data in python the empty fields have None value when the field is allowed to be null and where the field is not allowed to be null the value is "" empty string and when exported to csv all the values which are None and "" are inserted as empty strings in a csv file. As an example the row would look like this 1234,,,,,1,,. And when I try to import the file to a postgres table all empty values in a csv are converted to null and are tried to insert this way, but it failed because fields which can’t be null don’t accept this value. Below you can see my code and after that code I pasted the improvement i did to avoid this problem.

import psycopg2
import csv

def export_table(filename, tablename):
    conn = psycopg2.connect(....)

    cur = conn.cursor()

    cur.execute(f'SELECT * FROM {tablename} where f=123')

    rows = cur.fetchall()

    with open(filename, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        for row in rows:
            writer.writerow(row)

    cur.close()
    conn.close()


def import_table(filename, tablename):

    conn = psycopg2.connect(..second db data)

    cur = conn.cursor()
    with open(filename, 'r') as csvfile:
        cur.copy_expert(
            f"COPY {tablename} FROM STDIN WITH (FORMAT CSV)",
            csvfile
        )

    conn.commit()
    cur.close()
    conn.close()

I tried to add csv.QUOTE_MINIMAL, csv.QUOTE_NONNUMERIC – they did not help me.

Because I was not able to import the data with this code, I tried to try one more thing.

I added a manual function for quoting:

def quote_field(field):
    if isinstance(field, str):
        if field == '':
            return '""'
        elif any(c in field for c in (',', '"', 'n')):
            return '"' + field.replace('"', '""') + '"'

    return field

And updated the import part this way:

with open(filename, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile, quoting=csv.QUOTE_NONE, quotechar='', escapechar='\')
    for row in rows:
        writer.writerow([quote_field(field) for field in row])

I tried running the code, it pasting null values to a csv as "" and None values are placed in a csv as just empty fields. So a row in a csv would look like this 1234,,,"","",,,,,"",,,,, and for some of the cases this would successfully work, the data was imported correctly. But sometimes for some reason the csv that is generated is not imported at all or just partially. To check it I tried to use DataGrip to import data from a csv file manually, for some data it was also importing it just partially (like 20 rows out of 1000) and for some data it was not importing at all. I checked the csv’s for validity, they were valid.
I think there’s a bug in an import part but i don’t know where it is and why it is behaving this way. Need help with this.

Asked By: komron

||

Answers:

Create tables:

create table csv_null(id integer not null, fld1 varchar);
insert into csv_null values (1, 'test'), (2, ''), (3, null), (4, 'cat');
create table csv_null_2 as select * from csv_null limit 0;
 pset null
Null display is "NULL".
select * from csv_null;
 id | fld1 
----+------
  1 | test
  2 | 
  3 | NULL
  4 | cat

Python code:

import io
import psycopg2
con = psycopg2.connect(dbname="test", host='localhost', user='postgres', port=5432)
buffer = io.StringIO()
cur = con.cursor()
cur.copy_expert('copy (select * from csv_null ) TO  STDOUT WITH CSV HEADER', buffer)

buffer.seek(0)
cur.copy_expert('copy csv_null_2 from  STDIN WITH CSV HEADER', buffer)
con.commit()

cur.execute("select * from csv_null_2")
cur.fetchall()
[(1, 'test'), (2, ''), (3, None), (4, 'cat')]

In psql:

select * from csv_null_2 ;
 id | fld1 
----+------
  1 | test
  2 | 
  3 | NULL
  4 | cat


The cur.copy_expert('copy (select * from csv_null ) TO STDOUT WITH CSV HEADER', buffer) is going to produce output that cur.copy_expert('copy csv_null_2 from STDIN WITH CSV HEADER', buffer) is going to consume correctly. You are not jumping from one context Python csv to Postgres COPY.

UPDATE

Table altered to have NOT NULL column:

alter table csv_null add column fld2 varchar not null default '';
update csv_null set fld2 = 'not null' where id in (1,4);
alter table csv_null_2 add column fld2 varchar not null default '';
truncate csv_null_2;

Python code:

buffer = io.StringIO()
cur.copy_expert('copy (select * from csv_null ) TO  STDOUT WITH CSV HEADER', buffer)
buffer.seek(0)
cur.copy_expert('copy csv_null_2 from  STDIN WITH CSV HEADER', buffer)
con.commit()

cur.execute("select * from csv_null_2")
cur.fetchall()
[(2, '', ''), (3, None, ''), (1, 'test', 'not null'), (4, 'cat', 'not null')]

In psql:

select * from csv_null_2 ;
 id | fld1 |   fld2   
----+------+----------
  2 |      | 
  3 | NULL | 
  1 | test | not null
  4 | cat  | not null


Answered By: Adrian Klaver
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.