SQLAlchemy Hash Join

Question:

I have a query onto which I want to join a subquery, which works fine using an outerjoin function call, the problem is that the emitted query runs in 2-3s, vs almost instantly when explicitly specifying the join method as LEFT HASH JOIN in the SQL server console.

Is there a way I can use to force SQLAlchemy to emit a LEFT HASH JOIN statement along the lines of

query = query.outerjoin(sub_query, join_conditions, method='hash')

? I’ve tried searching both on StackOverflow and elsewhere on the internet, but haven’t been able to find anything relevant. I’m using Microsoft SQL Server if that has any bearing on the results.

If it’s a case of performance being sacrificed for the simplicity of using an ORM, then it’s a trade-off I’m happy to make, but obviously I’d prefer not to!

Asked By: Jonathan Windridge

||

Answers:

After further reading, as Ilija mentioned in their comment, this isn’t something that is currently possible within SQLAlchemy.

However, that same reading also indicated that specifying join hints on SQL queries should typically only be the very last step in a query optimisation plan – e.g here.

Instead of trying to monkey-patch the functionality I was after into SQLAlchemy, I went back and re-examined the contents of join_conditions I had abbreviated in my question and found that by tweaking the joining logic, I could get SQL Server to generate a much faster execution plan.

Answered By: Jonathan Windridge

You can update the SQLCompiler within your Dialect to support the necessary functionality as shown below.

I doubt this would be a reasonable solution for most practical use-cases, but it does carry some educational value, I believe.

from sqlalchemy.dialects.sqlite import base

class MySQLiteCompiler(base.SQLiteCompiler):
  # Update the visit_join method to support a hypothetical
  # MAGIC keyword.
  #
  # For that we copy the code over from
  # https://github.com/sqlalchemy/sqlalchemy/blob/e7aabd54c4defe237cecfa80863f0d7fa5a48035/lib/sqlalchemy/sql/compiler.py#L4987
  # and add a small modification (marked below).
  #
  # The borrowed piece of code is subject to:
  # Copyright (C) 2005-2023 the SQLAlchemy authors and contributors.
  # Released under the MIT license.
  def visit_join(self, join, asfrom=False, from_linter=None, **kwargs):
    if from_linter:
        from_linter.edges.update(
            itertools.product(
                join.left._from_objects, join.right._from_objects
            )
        )
    if join.full:
        join_type = " FULL OUTER JOIN "
    elif join.isouter:
        join_type = " LEFT OUTER JOIN "
        # --- Start of modification ---
        if join.ismagic:
          join_type = " LEFT MAGIC JOIN "
        # --- End of modification ---
    else:
        join_type = " JOIN "
    return (
        join.left._compiler_dispatch(
            self, asfrom=True, from_linter=from_linter, **kwargs
        )
        + join_type
        + join.right._compiler_dispatch(
            self, asfrom=True, from_linter=from_linter, **kwargs
        )
        + " ON "
        # TODO: likely need asfrom=True here?
        + join.onclause._compiler_dispatch(
            self, from_linter=from_linter, **kwargs
        )
    )

class MyDialect(base.SQLiteDialect):
    statement_compiler = MySQLiteCompiler

Example usage:

import sqlalchemy as sa

my_join = sa.join(sa.select("a"), sa.select("b"),
                  sa.text("whatever"), isouter=True)
my_join.ismagic = True 

print(my_join.compile(dialect=MyDialect()))
Answered By: KT.