Extremely slow Postgres inserts after switching to Hetzner servers

Question:

I’ve got a python script that fetches data from an API, processes it and stores it in Postgres using SQLAlchemy.

  • On my local machine, I can insert 120 rows in a few milliseconds using vanilla postgres 14.

  • On my current web host, insertion performance is similar. This instance of postgres runs in a docker container and has TimescaleDB installed. Python script is also hosted here.

But due to cost, I’m looking for cheaper DB solutions. I tried Hetzner managed Postgres, and insertion times immediately went to 10-20 seconds for 120 rows. (!)

The python script is hosted in the USA and the managed Postgres is in Germany. So there’s some distance to cover, but I don’t think that should cripple insertion times like this. My first guess was the managed postgres just didn’t have enough resources.

To sanity check, I spun up a cloud server on Hetzner in the USA, installed a fresh copy of postgres 14, restored the DB and connected the script to that. Insertion times are better, but in the 5-10 second range for 120 rows. This server has 4 vCPU’s and 8 GB of memory. Similar to my current web host.

In all these cases, the only thing I’m changing is the Postgres connection string in SQLAlchemy. The python code is identical. And the databases on Hetzner were created using PG_RESTORE, so the indexes/constraints/etc should all be identical to my local and production tables.

While I was expecting some hit to performance, this is way slower than it should be.

Is there anything I can try to troubleshoot this? Seems strange that the cloud server instance with dedicated resources is also extremely slow. I must be missing something.

EDIT: Here’s the Python function I wrote to manage the insertions. I was using SQLAlchemy 1.4.43, but recently upgraded to v2.0b3 to see if that would help improve insertion speed, but it didn’t make a measurable improvement.

async def saveSQL(session: AsyncSession, objs: list) -> None:
''' Params: Async DB Session, list of SQL Alchemy ORM Models'''
    try:
        session.add_all(objs)
        await session.commit()

    except Exception as e:
        logger.exception(f"Exception while saving models to SQL: {e}")

    finally:
        await session.close()

EDIT 2: for anyone looking for a solution to speeding up ORM insertion performance, using this syntax was a LOT faster than add_all().

async def saveSQL(session: AsyncSession, model: Base, objs: list) -> None:
''' Params: Async DB Session, the SQLAlchemy Table model, and a list of SQL Alchemy ORM Models'''
    try:
        rows = [dict(obj) for obj in objs]
        await session.scalars(insert(model).returning(model), rows)
        await session.commit()

    except Exception as e:
        logger.exception(f"Exception while saving models to SQL: {e}")

    finally:
        await session.close()
Asked By: BCC

||

Answers:

That’s network latency. Each INSERT is a client-server round trip and costs at least twice the network latency. See this article for a description and a proof of concept.

You have two options:

  1. Use fewer statements. For example, you could insert the 120 rows with a single COPY statement.

  2. Use pipelining so that you don’t have to wait for the result of the previous statement before you send the next one.

  3. Place database client and server so that the network latency is as low as possible. That is the most promising approach.

Answered By: Laurenz Albe