flask-sqlalchemy query for returning newest and using distinct columns
Question:
I am really struggling to write the correct postgres query that I want in flask-sqlalchemy.
A sample table of what my data looks like is below. I am trying to get back xxx
and yyy
for the newest (latest time stamp) for each unique name
.
So ideally for the data below I want my query to return the bottom 4 entries.
+------------+-------+-----+--------------------------+
| name | xxx | yyy | time | ... (other columns)
+------------+-------+-----+--------------------------+
| aaa | 2 | 12 | 2021-03-11 20:27:13+00 |
| bbbb | 9 | 13 | 2021-03-11 20:27:13+00 |
| cccc | 2 | 16 | 2021-03-11 20:27:13+00 |
| dddd | 10 | 26 | 2021-03-11 20:27:13+00 |
| aaa | 4 | 13 | 2021-03-11 20:27:23+00 |
| bbbb | 8 | 12 | 2021-03-11 20:27:23+00 |
| cccc | 1 | 15 | 2021-03-11 20:27:23+00 |
| dddd | 12 | 26 | 2021-03-11 20:27:23+00 |
| aaa | 3 | 12 | 2021-03-11 20:27:33+00 |
| bbbb | 6 | 11 | 2021-03-11 20:27:33+00 |
| cccc | 1 | 17 | 2021-03-11 20:27:33+00 |
| dddd | 13 | 23 | 2021-03-11 20:27:33+00 |
+------------+-------+-----|--------------------------+
My most basic to return the latest of a single entry looks like:
single_query = MyModel
.query
.filter_by(name = 'aaaa)
.order_by(desc(MyModel.time))
.first()
Using my model, I have tried to get my expected result with queries like below (based on some other SO answers):
full_query = MyModel
.query
.with_entities(MyModel.name, MyModel.time, MyModel.xxx)
.distinct(MyMmodel.name)
.all()
This gets me most of the way there, but it is returning random entries (seemingly at least). I thought I would just be easily able to add an order_by(desc(MyModel.time))
, but I can’t make it work with the above query.
Any suggestions on how I can get this to work or some pointers to get me in the correct direction? I’ve been scratching my head for a while. 🙂
I’ve done a lot of searching but don’t know how to extend answers like this (SQLalchemy distinct, order_by different column) to my Model querying.
UPDATE
If I wanted to query two tables at once, for example sample_table_1
and sample_table_2
via MyModel
and MyModel2
, how can I translate postgres to flask-sqlalchemy?
A raw query that achieves what I want is below:
I was able to base this off of @Oluwafemi Sule’s helpful answer. Can I extend this or the solution query to suit my needs 🙂
SELECT
m.name,
m.xxx,
m.yyy
n.xxx,
n.yyy
FROM (
SELECT
name,
xxx,
yyy,
ROW_NUMBER() OVER(PARTITION BY container_name ORDER BY time DESC) AS rn
FROM
sample_table_1
) m, (
SELECT
name,
xxx,
yyy,
ROW_NUMBER() OVER(PARTITION BY container_name ORDER BY time DESC) AS rn
FROM
sample_table_2
) n
WHERE m.container_name = n.container_name and m.rn = 1 and n.rn = 1;
Answers:
The SQL query to get your expected results is as follows:
SELECT
name
,xxx
,yyy
,time
FROM
(
SELECT
name
,xxx
,yyy
,time
-- Number rows after partitioning by name and reverse chronological ordering
,ROW_NUMBER () OVER (PARTITION BY name ORDER BY time DESC) AS rn
FROM sample_table
) subquery
WHERE rn = 1;
Now, composing the SQLAlchemy query shall be as follows:
from sqlalchemy import func
subquery = (
MyModel
.query
.with_entities(
MyModel.name,
MyModel.time,
MyModel.xxx,
MyModel.yyy,
func.row_number().over(
partition_by=MyModel.name,
order_by=MyModel.time.desc()
).label("rn")
)
.subquery()
)
full_query = (
MyModel.query.with_entities(
subquery.c.name,
subquery.c.time,
subquery.c.xxx,
subquery.c.yyy
)
.select_from(subquery)
.filter(subquery.c.rn == 1)
.all()
)
You can get the desired result by selecting the distinct first_value for each column, partitioned by name and ordered by time
using the sqlalchemy.core language
import sqlalchemy as sa
from sqlalchemy import func
stmt = sa.select([
func.first_value(c)
.over(partition_by=MyModel.name,
order_by=MyModel.time.desc())
.label(c.name)
for c in MyModel.__table__.c
]).distinct()
this generates sql resembling:
select distinct
first_value(my_model.name) OVER (PARTITION BY name ORDER BY time DESC) AS name, ...
from my_model
I am really struggling to write the correct postgres query that I want in flask-sqlalchemy.
A sample table of what my data looks like is below. I am trying to get back xxx
and yyy
for the newest (latest time stamp) for each unique name
.
So ideally for the data below I want my query to return the bottom 4 entries.
+------------+-------+-----+--------------------------+
| name | xxx | yyy | time | ... (other columns)
+------------+-------+-----+--------------------------+
| aaa | 2 | 12 | 2021-03-11 20:27:13+00 |
| bbbb | 9 | 13 | 2021-03-11 20:27:13+00 |
| cccc | 2 | 16 | 2021-03-11 20:27:13+00 |
| dddd | 10 | 26 | 2021-03-11 20:27:13+00 |
| aaa | 4 | 13 | 2021-03-11 20:27:23+00 |
| bbbb | 8 | 12 | 2021-03-11 20:27:23+00 |
| cccc | 1 | 15 | 2021-03-11 20:27:23+00 |
| dddd | 12 | 26 | 2021-03-11 20:27:23+00 |
| aaa | 3 | 12 | 2021-03-11 20:27:33+00 |
| bbbb | 6 | 11 | 2021-03-11 20:27:33+00 |
| cccc | 1 | 17 | 2021-03-11 20:27:33+00 |
| dddd | 13 | 23 | 2021-03-11 20:27:33+00 |
+------------+-------+-----|--------------------------+
My most basic to return the latest of a single entry looks like:
single_query = MyModel
.query
.filter_by(name = 'aaaa)
.order_by(desc(MyModel.time))
.first()
Using my model, I have tried to get my expected result with queries like below (based on some other SO answers):
full_query = MyModel
.query
.with_entities(MyModel.name, MyModel.time, MyModel.xxx)
.distinct(MyMmodel.name)
.all()
This gets me most of the way there, but it is returning random entries (seemingly at least). I thought I would just be easily able to add an order_by(desc(MyModel.time))
, but I can’t make it work with the above query.
Any suggestions on how I can get this to work or some pointers to get me in the correct direction? I’ve been scratching my head for a while. 🙂
I’ve done a lot of searching but don’t know how to extend answers like this (SQLalchemy distinct, order_by different column) to my Model querying.
UPDATE
If I wanted to query two tables at once, for example sample_table_1
and sample_table_2
via MyModel
and MyModel2
, how can I translate postgres to flask-sqlalchemy?
A raw query that achieves what I want is below:
I was able to base this off of @Oluwafemi Sule’s helpful answer. Can I extend this or the solution query to suit my needs 🙂
SELECT
m.name,
m.xxx,
m.yyy
n.xxx,
n.yyy
FROM (
SELECT
name,
xxx,
yyy,
ROW_NUMBER() OVER(PARTITION BY container_name ORDER BY time DESC) AS rn
FROM
sample_table_1
) m, (
SELECT
name,
xxx,
yyy,
ROW_NUMBER() OVER(PARTITION BY container_name ORDER BY time DESC) AS rn
FROM
sample_table_2
) n
WHERE m.container_name = n.container_name and m.rn = 1 and n.rn = 1;
The SQL query to get your expected results is as follows:
SELECT
name
,xxx
,yyy
,time
FROM
(
SELECT
name
,xxx
,yyy
,time
-- Number rows after partitioning by name and reverse chronological ordering
,ROW_NUMBER () OVER (PARTITION BY name ORDER BY time DESC) AS rn
FROM sample_table
) subquery
WHERE rn = 1;
Now, composing the SQLAlchemy query shall be as follows:
from sqlalchemy import func
subquery = (
MyModel
.query
.with_entities(
MyModel.name,
MyModel.time,
MyModel.xxx,
MyModel.yyy,
func.row_number().over(
partition_by=MyModel.name,
order_by=MyModel.time.desc()
).label("rn")
)
.subquery()
)
full_query = (
MyModel.query.with_entities(
subquery.c.name,
subquery.c.time,
subquery.c.xxx,
subquery.c.yyy
)
.select_from(subquery)
.filter(subquery.c.rn == 1)
.all()
)
You can get the desired result by selecting the distinct first_value for each column, partitioned by name and ordered by time
using the sqlalchemy.core language
import sqlalchemy as sa
from sqlalchemy import func
stmt = sa.select([
func.first_value(c)
.over(partition_by=MyModel.name,
order_by=MyModel.time.desc())
.label(c.name)
for c in MyModel.__table__.c
]).distinct()
this generates sql resembling:
select distinct
first_value(my_model.name) OVER (PARTITION BY name ORDER BY time DESC) AS name, ...
from my_model