Sqlalchemy is slow when doing query the first time

Question:

I’m using Sqlalchemy(2.0.3) with python3.10 and after fresh container boot it takes ~2.2s to execute specific query, all consecutive calls of the same query take ~70ms to execute. I’m using PostgreSQL and it takes 40-70ms to execute raw query in DataGrip.
Here is the code:

self._Session = async_sessionmaker(self._engine, expire_on_commit=False)

...

@property
def session(self):
   return self._Session

...

async with PostgreSQL().session.begin() as session:
    total_functions = aliased(db_models.Function)
    finished_functions = aliased(db_models.Function)
    failed_functions = aliased(db_models.Function)
    stmt = (
        select(
            db_models.Job,
            func.count(distinct(total_functions.id)).label("total"),
            func.count(distinct(finished_functions.id)).label("finished"),
            func.count(distinct(failed_functions.id)).label("failed")
        )
        .where(db_models.Job.project_id == project_id)
        .outerjoin(db_models.Job.packages)
        .outerjoin(db_models.Package.modules)
        .outerjoin(db_models.Module.functions.of_type(total_functions))
        .outerjoin(finished_functions, and_(
               finished_functions.module_id == db_models.Module.id,
               finished_functions.progress == db_models.FunctionProgress.FINISHED
            ))
        .outerjoin(failed_functions, and_(
            failed_functions.module_id == db_models.Module.id,
            or_(
                failed_functions.state == db_models.FunctionState.FAILED,
                failed_functions.state == db_models.FunctionState.TERMINATED,
            ))
           )
        .group_by(db_models.Job.id)
    )
    start = time.time()
    yappi.set_clock_type("WALL")
    with yappi.run():
        job_infos = await session.execute(stmt)
    yappi.get_func_stats().print_all()
    end = time.time()

Things I have tried and discovered:

  • Problem is not related to connection or querying the database database. On service boot I establish connection and make some other queries.
  • Problem most likely not related to cache. I have disabled cache with query_cache_size=0, however I’m not 100% sure that it worked, since documentations says:

ORM functions related to unit-of-work persistence as well as some attribute loading strategies will make use of individual per-mapper caches outside of the main cache.

  • Profiler didn’t show anything that caught my attention:
..urrency_py3k.py:130 greenlet_spawn  2/1    0.000000  2.324807  1.162403  
..rm/session.py:2168 Session.execute  1      0.000028  2.324757  2.324757  
..0 _UnixSelectorEventLoop._run_once  11     0.000171  2.318555  0.210778  
..syncpg_cursor._prepare_and_execute  1      0.000054  2.318187  2.318187  
..cAdapt_asyncpg_connection._prepare  1      0.000020  2.316333  2.316333  
..nnection.py:533 Connection.prepare  1      0.000003  2.316154  2.316154  
..nection.py:573 Connection._prepare  1      0.000017  2.316151  2.316151  
..n.py:359 Connection._get_statement  2/1    0.001033  2.316122  1.158061  
..ectors.py:452 EpollSelector.select  11     0.000094  2.315352  0.210487  
..y:457 Connection._introspect_types  1      0.000025  2.314904  2.314904  
..ction.py:1669 Connection.__execute  1      0.000027  2.314879  2.314879  
..ion.py:1699 Connection._do_execute  1      2.314095  2.314849  2.314849  
...py:2011 Session._execute_internal  1      0.000034  0.006174  0.006174

I have also seen that one may disable cache per connection:

with engine.connect().execution_options(compiled_cache=None) as conn:
    conn.execute(table.select())

However I’m working with ORM layer and not sure how to apply this in my case.

Any ideas where this delay might come from?

Asked By: JuicyKitty

||

Answers:

After hours of googling I have found this post. In short, problem is related to lack of dependencies(in some alpine docker images) that are required by JIT that is used by Postgres. For details I really recommend to read post and real-life impact author provides.
Actual solution for Sqlalchemy is to switch off JIT:

engine = create_async_engine(
    "postgresql+asyncpg://user:password@localhost/tmp",
    connect_args={"server_settings": {"jit": "off"}},
)

Reference to docs.

Answered By: JuicyKitty
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.