How to calculate the average of the N most recent entries in SQLite using the sqlite3 module?

Question:

I’m coding in Python 3.11, using the tkinter and sqlite3 packages. I’ve generated a database with four columns, one of them is called weight and its values are defined as real (aka decimals/floats). What I want to do is write a function using cursor.execute that "selects" the 7 most recent entries in the weight column, calculates and returns those 7 values’ average.

I understand SQLite3 has the in-built function AVG() and I’ve tried to use it, but that function is taking the average of all entries in the weight column, and I haven’t been able to research a way to direct it to only take the N most recent entries.

I also understand SqLite3 has the ability to cursor.fetchmany(7), but Sqlite3 makes all data into tuples. So when I fetchmany(7) and hardcode it to produce the average, it throws errors about tuples being unable to interact with int/str/floats. Here’s what my function looks like so far. What I actually get when I execute this function is the average of all entries in the column, rather than the last 7.

def average_query():
    
    #Create a database or connect to one
    conn = sqlite3.connect('weight_tracker.db')
    #Create cursor
    c = conn.cursor()
    
    my_average = c.execute("SELECT round(avg(weight)) FROM weights ORDER BY oid DESC LIMIT 7")
    my_average = c.fetchall()
    my_average = my_average[0][0]
    
    #Create labels on screen
    average_label = Label(root,text=f"Your average 7-day rolling weight is {my_average} pounds.")
    average_label.grid(row=9, column=0, columnspan=2)

    #Commit changes
    conn.commit()
    #Close connection
    conn.close()
Asked By: Jay_SK

||

Answers:

You can either extract the top 5 and then do the average in Python:

res = c.execute('SELECT weight FROM weights ORDER BY oid DESC LIMIT 7')
rows = res.fetchall()
# E.G. [(40,), (0,), (0,), (2500,), (1500,), (144,), (999,)]

avg = sum(r[0] for r in rows) / len(rows)
# 740.4285714285714

Or you can use a nested query to perform the average:

res = c.execute('SELECT ROUND(AVG(*)) FROM ( SELECT weight FROM weights ORDER BY oid DESC LIMIT 7 );')

rows = res.fetchall()
# [(740.4285714285714,)]

avg = rows[0][0]
# 740.4285714285714
Answered By: Marco Bonelli

Hey I’d recommend that you try to keep your data within SQL and compute the average there. SQL is designed and optimised for this exact kind of operation.

In order to achieve an average of the last N entries you can rely on window functions. They take a while to get used to, but once you figure them out they are super powerful.

The SQL query that should give you the result would look like this:

SELECT
  AVG(weight) OVER (
    ORDER BY oid DESC
    ROWS BETWEEN 7 PRECEDING AND CURRENT ROW
  ) as avg_weight
FROM
  weights

I put this together in a simple python script if you want to test this out. Note the data and column names are different, but it should show you how you might do this in python.

import sqlite3

connection = sqlite3.connect("demo.db")
cursor = connection.cursor()

## Setup database for testing ##
cursor.execute(
    """
    CREATE TABLE IF NOT EXISTS example (
        weight INT,
        timestamp INT
    )
"""
)

cursor.execute(
    """
    INSERT INTO example (weight, timestamp)
    VALUES 
        (10, 0), 
        (13, 1), 
        (5, 2), 
        (6, 3), 
        (10, 4), 
        (3, 5), 
        (10, 6), 
        (13, 7), 
        (5, 8), 
        (6, 9)
"""
)

## Get rolling average ##

cursor.execute(
    """
    SELECT
        AVG(weight) OVER (
            ORDER BY timestamp
            ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
        )
    FROM
        example
"""
)

print(cursor.fetchall())
Answered By: kmiller96