SQLAlchemy + SQL Injection

Question:

What are the best practices for mitigating SQL injection attacks when using SQLAlchemy?

Asked By: Mike

||

Answers:

If you have any “special” characters (such as semicolons or
apostrophes) in your data, they will
be automatically quoted for you by the
SQLEngine object, so you don’t have to
worry about quoting. This also means
that unless you deliberately bypass
SQLAlchemy’s quoting mechanisms,
SQL-injection attacks are basically
impossible.

[per http://www.rmunn.com/sqlalchemy-tutorial/tutorial.html]

Answered By: Andreas Jung

tldr: Avoid raw SQL as much as possible.

The accepted answer is lazy and incorrect. The filter method accepts raw SQL, and if used in that way, is fully susceptible to SQL injection attacks. For instance, if you were to accept a value from a url and combine it with raw sql in the filter, you are open to attack:

session.query(MyClass).filter("foo={}".format(getArgs['val']))

using the above code and the below url, you would be injecting SQL in to your filter statement. The code above would return all rows in your database.

URL encoded:

https://example.com/?val=2%20or%201%20=%201

Easier to understand (URL decoded):

https://example.com/?val=2 or 1 = 1
Answered By: Tendrid

To add to @Tendrid answer. I did a little investigation using quiet naive approach. filter method has *criterion as its argument, several other ORM Query methods have similar argument.

In case of filter method *criterion argument ends up passed into _literal_as_text, which in case of string – marks it as safe sql (please correct me if I’m wrong). Therefore it makes it unsafe.

Here is outcome of ORM Query class method investigation with *criterion argument:

filter   - uses _literal_as_text (NOT SAFE)
having   - uses _literal_as_text (NOT SAFE)

distinct - uses _literal_as_label_reference (NOT SAFE)
group_by - uses _literal_as_label_reference (NOT SAFE)
order_by - uses _literal_as_label_reference (NOT SAFE)

join     - uses model attributes to resolve relation (SAFE)

Examples of possible method missuses (to keep it simple, string formatting is skipped):

db.session.query(User.login).group_by('login').having('count(id) > 4; select name from roles').all()
db.session.query(User.login).distinct('name) name from roles /*').order_by('*/').all()
db.session.query(User.login).order_by('users_login; select name from roles').all()
db.session.query(User.login).group_by('login union select name from roles').all()

Note that these methods are only unsafe if string literal is passed.

Answered By: aisbaa

I’m inclined to agree with @Tendrid’s answer.

If you write this:

session.query(MyClass).filter("foo={}".format(getArgs['val']))

… you are creating an injection vulnerability.

SqlAlchemy’s approach is to use bound parameters to avoid these injection attacks. The way you’re meant to use filter() is to write:

session.query(MyClass).filter(MyClass.foo == getArgs['va'])

As SqlAlchemy has overloaded python’s operators like == to escape the SQL correctly (and avoid injection). See here

There is a warning about this buried in the documentation of SqlAlchemy here that says:

Always use bound parameters

As mentioned at the beginning of this section, textual SQL is not the
usual way we work with SQLAlchemy. However, when using textual SQL, a
Python literal value, even non-strings like integers or dates, should
never be stringified into SQL string directly; a parameter should
always be used. This is most famously known as how to avoid SQL
injection attacks when the data is untrusted. However it also allows
the SQLAlchemy dialects and/or DBAPI to correctly handle the incoming
input for the backend. Outside of plain textual SQL use cases,
SQLAlchemy’s Core Expression API otherwise ensures that Python literal
values are passed as bound parameters where appropriate.

And there’s a section in the glossary on bound parameters here

It says:

Bound parameters are the primary means in which data is passed to the
DBAPI database driver. While the operation to be invoked is based on
the SQL statement string, the data values themselves are passed
separately, where the driver contains logic that will safely process
these strings and pass them to the backend database server, which may
either involve formatting the parameters into the SQL string itself,
or passing them to the database using separate protocols.

The specific system by which the database driver does this should not
matter to the caller; the point is that on the outside, data should
always be passed separately and not as part of the SQL string itself.
This is integral both to having adequate security against SQL
injections as well as allowing the driver to have the best
performance.

Basically that means this:

session.query(MyClass).filter("foo={}".format(getArgs['val']))

is broken because you’re passing the data to the filter() together with the SQL statement foo=<data>.

You’re meant to always keep the statement and data separate, i.e.:

session.query(MyClass).filter(MyClass.foo == getArgs['va'])

or

session.query(MyClass).filter_by(foo=getArgs['va'])

As then SqlAlchemy can work it’s magic and do the escaping with bound parameters.

Answered By: Donal
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.