flask-sqlalchemy query for returning newest and using distinct columns

Question:

I am really struggling to write the correct postgres query that I want in flask-sqlalchemy.

A sample table of what my data looks like is below. I am trying to get back xxx and yyy for the newest (latest time stamp) for each unique name.

So ideally for the data below I want my query to return the bottom 4 entries.

+------------+-------+-----+--------------------------+
|    name    |  xxx  | yyy |           time           | ... (other columns)
+------------+-------+-----+--------------------------+
| aaa        |   2   |  12 | 2021-03-11 20:27:13+00   |
| bbbb       |   9   |  13 | 2021-03-11 20:27:13+00   |
| cccc       |   2   |  16 | 2021-03-11 20:27:13+00   |
| dddd       |   10  |  26 | 2021-03-11 20:27:13+00   |
| aaa        |   4   |  13 | 2021-03-11 20:27:23+00   |
| bbbb       |   8   |  12 | 2021-03-11 20:27:23+00   |
| cccc       |   1   |  15 | 2021-03-11 20:27:23+00   |
| dddd       |   12  |  26 | 2021-03-11 20:27:23+00   |
| aaa        |   3   |  12 | 2021-03-11 20:27:33+00   |
| bbbb       |   6   |  11 | 2021-03-11 20:27:33+00   |
| cccc       |   1   |  17 | 2021-03-11 20:27:33+00   |
| dddd       |   13  |  23 | 2021-03-11 20:27:33+00   |
+------------+-------+-----|--------------------------+

My most basic to return the latest of a single entry looks like:

single_query = MyModel 
    .query 
    .filter_by(name = 'aaaa) 
    .order_by(desc(MyModel.time)) 
    .first()

Using my model, I have tried to get my expected result with queries like below (based on some other SO answers):

full_query = MyModel 
    .query 
    .with_entities(MyModel.name, MyModel.time, MyModel.xxx) 
    .distinct(MyMmodel.name) 
    .all()

This gets me most of the way there, but it is returning random entries (seemingly at least). I thought I would just be easily able to add an order_by(desc(MyModel.time)), but I can’t make it work with the above query.

Any suggestions on how I can get this to work or some pointers to get me in the correct direction? I’ve been scratching my head for a while. 🙂

I’ve done a lot of searching but don’t know how to extend answers like this (SQLalchemy distinct, order_by different column) to my Model querying.


UPDATE

If I wanted to query two tables at once, for example sample_table_1 and sample_table_2 via MyModel and MyModel2, how can I translate postgres to flask-sqlalchemy?

A raw query that achieves what I want is below:
I was able to base this off of @Oluwafemi Sule’s helpful answer. Can I extend this or the solution query to suit my needs 🙂

SELECT
  m.name,
  m.xxx,
  m.yyy
  n.xxx,
  n.yyy
FROM (
  SELECT
    name,
    xxx,              
    yyy,
    ROW_NUMBER() OVER(PARTITION BY container_name ORDER BY time DESC) AS rn       
  FROM                          
    sample_table_1
) m, (
  SELECT
    name,
    xxx,
    yyy,
    ROW_NUMBER() OVER(PARTITION BY container_name ORDER BY time DESC) AS rn 
  FROM
    sample_table_2
) n
WHERE m.container_name = n.container_name and m.rn = 1 and n.rn = 1;
Asked By: velo_fred

||

Answers:

The SQL query to get your expected results is as follows:

SELECT 
  name
  ,xxx
  ,yyy
  ,time 
FROM
 ( 
   SELECT 
     name
     ,xxx
     ,yyy
     ,time
     -- Number rows after partitioning by name and reverse chronological ordering
     ,ROW_NUMBER () OVER (PARTITION BY name ORDER BY time DESC) AS rn
  FROM sample_table
) subquery
WHERE rn = 1; 

Now, composing the SQLAlchemy query shall be as follows:

from sqlalchemy import func

subquery = (
    MyModel
    .query
    .with_entities(
        MyModel.name, 
        MyModel.time, 
        MyModel.xxx, 
        MyModel.yyy,
        func.row_number().over(
            partition_by=MyModel.name,
            order_by=MyModel.time.desc()
        ).label("rn")
    )
    .subquery()
)
full_query = (
    MyModel.query.with_entities(
        subquery.c.name,
        subquery.c.time,
        subquery.c.xxx,
        subquery.c.yyy
    )
    .select_from(subquery)
    .filter(subquery.c.rn == 1)
    .all()
)
Answered By: Oluwafemi Sule

You can get the desired result by selecting the distinct first_value for each column, partitioned by name and ordered by time

using the sqlalchemy.core language

import sqlalchemy as sa
from sqlalchemy import func

stmt = sa.select([
   func.first_value(c)
     .over(partition_by=MyModel.name,
           order_by=MyModel.time.desc())
     .label(c.name)
   for c in MyModel.__table__.c
   ]).distinct()

this generates sql resembling:

select distinct 
first_value(my_model.name) OVER (PARTITION BY name ORDER BY time DESC) AS name, ...
from my_model
Answered By: Haleemur Ali