SQLAlchemy – subquery in a WHERE clause
Question:
I’ve just recently started using SQLAlchemy and am still having trouble wrapping my head around some of the concepts.
Boiled down to the essential elements, I have two tables like this (this is through Flask-SQLAlchemy):
class User(db.Model):
__tablename__ = 'users'
user_id = db.Column(db.Integer, primary_key=True)
class Posts(db.Model):
__tablename__ = 'posts'
post_id = db.Column(db.Integer, primary_key=True)
user_id = db.Column(db.Integer, db.ForeignKey('users.user_id'))
post_time = db.Column(db.DateTime)
user = db.relationship('User', backref='posts')
How would I go about querying for a list of users and their newest post (excluding users with no posts). If I was using SQL, I would do:
SELECT [whatever]
FROM posts AS p
LEFT JOIN users AS u ON u.user_id = p.user_id
WHERE p.post_time = (SELECT MAX(post_time) FROM posts WHERE user_id = u.user_id)
So I know exactly the “desired” SQL to get the effect I want, but no idea how to express it “properly” in SQLAlchemy.
Edit: in case it’s important, I’m on SQLAlchemy 0.6.6.
Answers:
This should work (different SQL, same result):
t = Session.query(
Posts.user_id,
func.max(Posts.post_time).label('max_post_time'),
).group_by(Posts.user_id).subquery('t')
query = Session.query(User, Posts).filter(and_(
User.user_id == Posts.user_id,
User.user_id == t.c.user_id,
Posts.post_time == t.c.max_post_time,
))
for user, post in query:
print user.user_id, post.post_id
Where c stands for ‘columns’
the previous answer works, but also the exact sql you asked for is written much as the actual statement:
print s.query(User, Posts).
outerjoin(Posts.user).
filter(Posts.post_time==
s.query(
func.max(Posts.post_time)
).
filter(Posts.user_id==User.user_id).
correlate(User).
as_scalar()
)
I guess the “concept” that isn’t necessarily apparent is that as_scalar() is currently needed to establish a subquery as a “scalar” (it should probably assume that from the context against ==).
Edit: Confirmed, that’s buggy behavior, completed ticket #2190. In the current tip or release 0.7.2, the as_scalar() is called automatically and the above query can be:
print s.query(User, Posts).
outerjoin(Posts.user).
filter(Posts.post_time==
s.query(
func.max(Posts.post_time)
).
filter(Posts.user_id==User.user_id).
correlate(User)
)
It is usually expressed similarly to the actual SQL – you create a subquery that returns single result and compare against that – however what sometimes can be real pain is if you have to use a table in the subquery that you are already querying or joining on.
Solution is to create an aliased version of the model to reference in the subquery.
So let’s say you are already operating in a connection where you have an existing Posts
model
and some basic query
ready – now, you’d want to query for the list of latest (single) post from each user, you’d filter the query like:
from sqlalchemy.orm import aliased
posts2 = aliased(Posts) # create aliased version
query = query.filter(
model.post_id
==
Posts.query # create query directly from model, NOT from the aliased version!
.with_entities(posts2.post_id) # only select column "post_id"
.filter(
posts2.user_id == model.user_id
)
.order_by(posts2.post_id.desc()) # assume higher id == newer post
.limit(1) # we must limit to a single row so we only get 1 value
)
I’ve purposedly did not use the func.max
because I consider that a simpler version and it’s already in other answers, this example I think will be useful to people that generally find this question because they are looking for a solution how to subquery the same table.
I’ve just recently started using SQLAlchemy and am still having trouble wrapping my head around some of the concepts.
Boiled down to the essential elements, I have two tables like this (this is through Flask-SQLAlchemy):
class User(db.Model):
__tablename__ = 'users'
user_id = db.Column(db.Integer, primary_key=True)
class Posts(db.Model):
__tablename__ = 'posts'
post_id = db.Column(db.Integer, primary_key=True)
user_id = db.Column(db.Integer, db.ForeignKey('users.user_id'))
post_time = db.Column(db.DateTime)
user = db.relationship('User', backref='posts')
How would I go about querying for a list of users and their newest post (excluding users with no posts). If I was using SQL, I would do:
SELECT [whatever]
FROM posts AS p
LEFT JOIN users AS u ON u.user_id = p.user_id
WHERE p.post_time = (SELECT MAX(post_time) FROM posts WHERE user_id = u.user_id)
So I know exactly the “desired” SQL to get the effect I want, but no idea how to express it “properly” in SQLAlchemy.
Edit: in case it’s important, I’m on SQLAlchemy 0.6.6.
This should work (different SQL, same result):
t = Session.query(
Posts.user_id,
func.max(Posts.post_time).label('max_post_time'),
).group_by(Posts.user_id).subquery('t')
query = Session.query(User, Posts).filter(and_(
User.user_id == Posts.user_id,
User.user_id == t.c.user_id,
Posts.post_time == t.c.max_post_time,
))
for user, post in query:
print user.user_id, post.post_id
Where c stands for ‘columns’
the previous answer works, but also the exact sql you asked for is written much as the actual statement:
print s.query(User, Posts).
outerjoin(Posts.user).
filter(Posts.post_time==
s.query(
func.max(Posts.post_time)
).
filter(Posts.user_id==User.user_id).
correlate(User).
as_scalar()
)
I guess the “concept” that isn’t necessarily apparent is that as_scalar() is currently needed to establish a subquery as a “scalar” (it should probably assume that from the context against ==).
Edit: Confirmed, that’s buggy behavior, completed ticket #2190. In the current tip or release 0.7.2, the as_scalar() is called automatically and the above query can be:
print s.query(User, Posts).
outerjoin(Posts.user).
filter(Posts.post_time==
s.query(
func.max(Posts.post_time)
).
filter(Posts.user_id==User.user_id).
correlate(User)
)
It is usually expressed similarly to the actual SQL – you create a subquery that returns single result and compare against that – however what sometimes can be real pain is if you have to use a table in the subquery that you are already querying or joining on.
Solution is to create an aliased version of the model to reference in the subquery.
So let’s say you are already operating in a connection where you have an existing Posts
model
and some basic query
ready – now, you’d want to query for the list of latest (single) post from each user, you’d filter the query like:
from sqlalchemy.orm import aliased
posts2 = aliased(Posts) # create aliased version
query = query.filter(
model.post_id
==
Posts.query # create query directly from model, NOT from the aliased version!
.with_entities(posts2.post_id) # only select column "post_id"
.filter(
posts2.user_id == model.user_id
)
.order_by(posts2.post_id.desc()) # assume higher id == newer post
.limit(1) # we must limit to a single row so we only get 1 value
)
I’ve purposedly did not use the func.max
because I consider that a simpler version and it’s already in other answers, this example I think will be useful to people that generally find this question because they are looking for a solution how to subquery the same table.