SQLAlchemy – subquery in a WHERE clause

Question

I’ve just recently started using SQLAlchemy and am still having trouble wrapping my head around some of the concepts.

Boiled down to the essential elements, I have two tables like this (this is through Flask-SQLAlchemy):

class User(db.Model):
    __tablename__ = 'users'
    user_id = db.Column(db.Integer, primary_key=True)

class Posts(db.Model):
    __tablename__ = 'posts'
    post_id = db.Column(db.Integer, primary_key=True)
    user_id = db.Column(db.Integer, db.ForeignKey('users.user_id'))
    post_time = db.Column(db.DateTime)

    user = db.relationship('User', backref='posts')

How would I go about querying for a list of users and their newest post (excluding users with no posts). If I was using SQL, I would do:

SELECT [whatever]
FROM posts AS p
    LEFT JOIN users AS u ON u.user_id = p.user_id
WHERE p.post_time = (SELECT MAX(post_time) FROM posts WHERE user_id = u.user_id)

So I know exactly the “desired” SQL to get the effect I want, but no idea how to express it “properly” in SQLAlchemy.

Edit: in case it’s important, I’m on SQLAlchemy 0.6.6.

Asked By: Chad Birch

||

Source

Answer 1

This should work (different SQL, same result):

t = Session.query(
    Posts.user_id,
    func.max(Posts.post_time).label('max_post_time'),
).group_by(Posts.user_id).subquery('t')

query = Session.query(User, Posts).filter(and_(
    User.user_id == Posts.user_id,
    User.user_id == t.c.user_id,
    Posts.post_time == t.c.max_post_time,
))

for user, post in query:
    print user.user_id, post.post_id

Where c stands for ‘columns’

Answered By: sayap

Answer 2

the previous answer works, but also the exact sql you asked for is written much as the actual statement:

print s.query(User, Posts).
    outerjoin(Posts.user).
    filter(Posts.post_time==
        s.query(
            func.max(Posts.post_time)
        ).
        filter(Posts.user_id==User.user_id).
        correlate(User).
        as_scalar()
    )

I guess the “concept” that isn’t necessarily apparent is that as_scalar() is currently needed to establish a subquery as a “scalar” (it should probably assume that from the context against ==).

Edit: Confirmed, that’s buggy behavior, completed ticket #2190. In the current tip or release 0.7.2, the as_scalar() is called automatically and the above query can be:

print s.query(User, Posts).
    outerjoin(Posts.user).
    filter(Posts.post_time==
        s.query(
            func.max(Posts.post_time)
        ).
        filter(Posts.user_id==User.user_id).
        correlate(User)
    )

Answered By: zzzeek

Answer 3

It is usually expressed similarly to the actual SQL – you create a subquery that returns single result and compare against that – however what sometimes can be real pain is if you have to use a table in the subquery that you are already querying or joining on.

Solution is to create an aliased version of the model to reference in the subquery.

So let’s say you are already operating in a connection where you have an existing Posts model and some basic query ready – now, you’d want to query for the list of latest (single) post from each user, you’d filter the query like:

from sqlalchemy.orm import aliased
posts2 = aliased(Posts) # create aliased version

query = query.filter(
    model.post_id
    ==
    Posts.query # create query directly from model, NOT from the aliased version!
        .with_entities(posts2.post_id) # only select column "post_id"
        .filter(
            posts2.user_id == model.user_id
        )
        .order_by(posts2.post_id.desc()) # assume higher id == newer post
        .limit(1) # we must limit to a single row so we only get 1 value
)

I’ve purposedly did not use the func.max because I consider that a simpler version and it’s already in other answers, this example I think will be useful to people that generally find this question because they are looking for a solution how to subquery the same table.

Answered By: jave.web

SQLAlchemy – subquery in a WHERE clause

Question:

Answers: