SQLAlchemy – bulk insert ignore: "Duplicate entry"
Question:
I have a table named user_data
, the column id
and user_id
as the unique key. I want to import some history data to this table. I use bulk_insert_mappings method to batch insert data. But there are errors as below:
IntegrityError: (pymysql.err.IntegrityError) (1062, u”Duplicate entry ‘1-1234’ for key ‘idx_on_id_and_user_id'”)
How to ignore this error and discard duplicate data when batch insert?
Answers:
You should handle every error. But if you really want to just ignore all errors, you can’t really do a bulk insert. Sometimes there will be integrity errors in the actual data you are importing. You have to insert one by one and ignore. I would only use this in once off scripts.
for item in dict_list:
try:
session.merge(orm(**item))
session.commit()
except Exception as e:
session.rollback()
You can use the sqlalchemy core API to do large bulk inserts, but it gets database specific. For those using postgres, it looks like this:
rows = [
{"id": 1, "name": "foo"},
{"id": 1, "name": "foo-dupe"},
]
stmt = insert(YourTableName).values(rows)
stmt = stmt.on_conflict_do_nothing(
index_elements=["id_or_column_causing_integrity_error", "colB"]
)
db.session.execute(stmt)
db.session.commit()
Note that the value of index_elements
takes a list with one or more names of the columns that make up the unique index you want to ignore.
I have a table named user_data
, the column id
and user_id
as the unique key. I want to import some history data to this table. I use bulk_insert_mappings method to batch insert data. But there are errors as below:
IntegrityError: (pymysql.err.IntegrityError) (1062, u”Duplicate entry ‘1-1234’ for key ‘idx_on_id_and_user_id'”)
How to ignore this error and discard duplicate data when batch insert?
You should handle every error. But if you really want to just ignore all errors, you can’t really do a bulk insert. Sometimes there will be integrity errors in the actual data you are importing. You have to insert one by one and ignore. I would only use this in once off scripts.
for item in dict_list:
try:
session.merge(orm(**item))
session.commit()
except Exception as e:
session.rollback()
You can use the sqlalchemy core API to do large bulk inserts, but it gets database specific. For those using postgres, it looks like this:
rows = [
{"id": 1, "name": "foo"},
{"id": 1, "name": "foo-dupe"},
]
stmt = insert(YourTableName).values(rows)
stmt = stmt.on_conflict_do_nothing(
index_elements=["id_or_column_causing_integrity_error", "colB"]
)
db.session.execute(stmt)
db.session.commit()
Note that the value of index_elements
takes a list with one or more names of the columns that make up the unique index you want to ignore.