SQLAlchemy – bulk insert ignore: "Duplicate entry"

Question:

I have a table named user_data, the column id and user_id as the unique key. I want to import some history data to this table. I use bulk_insert_mappings method to batch insert data. But there are errors as below:

IntegrityError: (pymysql.err.IntegrityError) (1062, u”Duplicate entry ‘1-1234’ for key ‘idx_on_id_and_user_id'”)

How to ignore this error and discard duplicate data when batch insert?

Asked By: pangpang

||

Answers:

You should handle every error. But if you really want to just ignore all errors, you can’t really do a bulk insert. Sometimes there will be integrity errors in the actual data you are importing. You have to insert one by one and ignore. I would only use this in once off scripts.

for item in dict_list:
    try:
        session.merge(orm(**item))
        session.commit()
    except Exception as e:
        session.rollback()
Answered By: Philliproso

You can use the sqlalchemy core API to do large bulk inserts, but it gets database specific. For those using postgres, it looks like this:

rows = [
    {"id": 1, "name": "foo"},
    {"id": 1, "name": "foo-dupe"},
]

stmt = insert(YourTableName).values(rows)
stmt = stmt.on_conflict_do_nothing(
    index_elements=["id_or_column_causing_integrity_error", "colB"]
)
db.session.execute(stmt)
db.session.commit()

Note that the value of index_elements takes a list with one or more names of the columns that make up the unique index you want to ignore.

Answered By: Hartley Brody
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.