Bulk inserts with Flask-SQLAlchemy

Question:

I’m using Flask-SQLAlchemy to do a rather large bulk insert of 60k rows. I also have a many-to-many relationship on this table, so I can’t use db.engine.execute for this. Before inserting, I need to find similar items in the database, and change the insert to an update if a duplicate item is found.

I could do this check beforehand, and then do a bulk insert via db.engine.execute, but I need the primary key of the row upon insertion.

Currently, I am doing a db.session.add() and db.session.commit() on each insert, and I get a measly 3-4 inserts per second.

I ran a profiler to see where the bottleneck is, and it seems that the db.session.commit() is taking 60% of the time.

Is there some way that would allow me to make this operation faster, perhaps by grouping commits, but which would give me primary keys back?

This is what my models looks like:

class Item(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    title = db.Column(db.String(1024), nullable=True)
    created = db.Column(db.DateTime())
    tags_relationship = db.relationship('Tag', secondary=tags, backref=db.backref('items', lazy='dynamic'))
    tags = association_proxy('tags_relationship', 'text')

class Tag(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    text = db.Column(db.String(255))

My insert operation is:

for item in items:
    if duplicate:
        update_existing_item
    else:
        x = Item()
        x.title = "string"
        x.created = datetime.datetime.utcnow()
        for tag in tags:
            if not tag_already_exists:
                y = Tag()
                y.text = "tagtext"
                x.tags_relationship.append(y)
                db.session.add(y)
                db.session.commit()
            else:
                x.tags_relationship.append(existing_tag)
    db.session.add(x)
    db.session.commit()
Asked By: Nik Barres

||

Answers:

Perhaps you should try to db.session.flush() to send the data to the server, which means any primary keys will be generated. At the end you can db.session.commit() to actually commit the transaction.

Answered By: wouter bolsterlee

I use the following code to quickly read the content of a pandas DataFrame into SQLite. Note that it circumvents the ORM features of SQLAlchemy. myClass in this context is a db.Model derived class that has a tablename assigned to it. As the code snippets mentions, I adapted

l = df.to_dict('records')
# bulk save the dictionaries, circumventing the slow ORM interface
# c.f. https://gist.github.com/shrayasr/5df96d5bc287f3a2faa4
connection.engine.execute(
    myClass.__table__.insert(),
    l
)
Answered By: Henning
from app import db

data = [{"attribute": "value"}, {...}, {...}, ... ]
db.engine.execute(YourModel.__table__.insert(), data)

for more information refer https://gist.github.com/shrayasr/5df96d5bc287f3a2faa4

Answered By: Akhil S