SQLAlchemy: filter by membership in at least one many-to-many related table

Question:

Using SQLAlchemy 0.7.1 and a MySQL 5.1 database, I’ve got a many-to-many relationship set up as follows:

user_groups = Table('user_groups', Base.metadata,
    Column('user_id', String(128), ForeignKey('users.username')),
    Column('group_id', Integer, ForeignKey('groups.id'))
)

class ZKUser(Base, ZKTableAudit):
    __tablename__ = 'users'

    username   = Column(String(128), primary_key=True)
    first_name = Column(String(512))
    last_name  = Column(String(512))

    groups = relationship(ZKGroup, secondary=user_groups, backref='users')

class ZKGroup(Base, ZKTableAudit):
    __tablename__ = 'groups'

    id          = Column(Integer, primary_key=True)
    name        = Column(String(512))

Users can belong to multiple Groups, and Groups can contain multiple Users.

What I’m trying to do is build a SQLAlchemy query that returns only the Users who belong to at least one Group out of a list of Groups.

I played around with the in_ function, but that only seems to work for testing scalar values for membership in a list. I’m not much of a SQL writer, so I don’t even know what kind of SELECT statement this would require.

Asked By: coredumperror

||

Answers:

You can use in_:

session.query(ZKUser).filter(ZKGroup.id.in_([1,2])).all()
Answered By: zeekay

OK, after a lot of research, I realized that it was my own ignorance of SQL terminology that was holding me back. My search for a solution to find users belonging to “at least one of” the list of groups should have been to find users belonging to “any” of the list of groups. The any ORM function from SQLAlchemy does exactly what I needed, like so:

session.query(ZKUser).filter(ZKUser.groups.any(ZKGroup.id.in_([1,2,3])))

That code emits this SQL (on MySQL 5.1):

SELECT * FROM users 
WHERE EXISTS (
    SELECT 1 FROM user_groups, groups 
    WHERE users.id = user_groups.contact_id 
        AND groups.id = user_groups.group_id 
        AND groups.id IN (%s, %s, %s)
    )
Answered By: coredumperror

According to the docs for any, the query will run faster if you use an explicit join instead:

Because any() uses a correlated subquery, its performance is not nearly as good when compared against large target tables as that of using a join.

In your case, you could do something like:

users = (
    session.query(ZKUser)
    .join(user_groups)
    .filter(user_groups.columns.group_id.in_([1, 2, 3]))
)

This emits SQL like:

SELECT *
FROM users
JOIN user_groups ON users.id = user_groups.user_id 
WHERE user_groups.group_id IN (1, 2, 3)
Answered By: z0r