Run group_by_dynamic in polars but only on timestamp

Question:

I have some dummy data like such:

datetime,duration_in_traffic_s
2023-12-20T10:50:43.063641000,221.0
2023-12-20T10:59:09.884939000,219.0
2023-12-20T11:09:56.003331000,206.0
...
more rows with different dates
...

Assume this data is stored in a file mwe.csv.
Using polars, I now want to compute averages over the second column, grouped in one hour chunks. I want to use group_by_dynamic (doc) to get the data every 10 minutes. I run

(
    pl.read_csv("mwe.csv")
    .with_columns(pl.col("datetime").cast(pl.Datetime))
    .sort("datetime")
    .group_by_dynamic(
        index_column="datetime",
        every="10m",
        period="1h",
    )
    .agg(pl.col("duration_in_traffic_s").mean())
)

and the result looks like this
enter image description here

However, I don’t want the averaging to take the date into account, only the time, e.g. 2023-12-20 10:40 and 2023-12-21 10:40 should fall into the same bin.

I hoped that adding .with_columns(pl.col("datetime").dt.time()) to the pipeline would help but group_by_dynamic doesn’t work with time data.

I could compute the time column as float manually as such

(
    pl.read_csv("mwe.csv")
    .with_columns(pl.col("datetime").cast(dtype=pl.Datetime))
    .with_columns(
        t=pl.col("datetime").dt.hour().cast(pl.Float64)
        + pl.col("datetime").dt.minute().cast(pl.Float64) / 60
        + pl.col("datetime").dt.second().cast(pl.Float64) / 60 / 60
    )
).sort("t")

But I am not sure how to the do the grouping. Also, I do like the time format, so my hope was that I could preserve that.

Is there a way to do the dynamic grouping on the time data only, ignoring the date?

Here’s the full mwe.csv file:

datetime,duration_in_traffic_s
2023-12-20T10:50:43.063641000,221.0
2023-12-20T10:59:09.884939000,219.0
2023-12-20T11:09:56.003331000,206.0
2023-12-20T11:12:42.347660000,206.0
2023-12-20T11:17:40.084821000,200.0
2023-12-20T11:31:14.957092000,222.0
2023-12-20T11:46:08.886872000,209.0
2023-12-20T12:00:02.024328000,198.0
2023-12-20T12:15:01.910446000,251.0
2023-12-20T12:30:01.447496000,229.0
2023-12-20T12:45:02.761839000,206.0
2023-12-20T14:00:01.456811000,262.0
2023-12-20T14:15:01.718898000,226.0
2023-12-20T14:30:02.452185000,194.0
2023-12-20T14:45:01.717522000,191.0
2023-12-20T14:49:10.150735000,196.0
2023-12-20T14:50:55.800417000,194.0
2023-12-20T14:57:05.230577000,202.0
2023-12-20T14:59:23.005408000,192.0
2023-12-20T15:00:01.316240000,193.0
2023-12-20T15:00:14.842233000,193.33333333333334
2023-12-20T15:00:49.370172000,193.66666666666666
2023-12-20T15:01:06.300133000,193.66666666666666
2023-12-20T15:15:01.943587000,183.0
2023-12-20T15:20:01.567126000,184.0
2023-12-20T15:30:01.784686000,197.0
2023-12-20T15:40:02.468132000,188.0
2023-12-20T15:50:01.968746000,226.0
2023-12-20T16:00:01.864652000,233.0
2023-12-20T16:10:01.185016000,213.0
2023-12-20T16:20:01.544796000,252.0
2023-12-20T16:30:01.621331000,224.0
2023-12-20T16:40:03.567996000,228.0
2023-12-20T16:50:01.014911000,220.0
2023-12-20T17:00:01.723306000,232.0
2023-12-20T17:10:02.490695000,215.0
2023-12-20T17:20:01.844304000,214.0
2023-12-20T17:30:02.147457000,204.0
2023-12-20T17:40:02.217333000,198.0
2023-12-20T17:50:01.741479000,193.0
2023-12-20T18:00:01.665714000,193.0
2023-12-20T18:10:02.334926000,182.0
2023-12-20T18:26:43.135849000,185.0
2023-12-20T18:30:02.434296000,184.0
2023-12-20T18:32:41.033250000,175.0
2023-12-20T18:40:02.941171000,176.0
2023-12-20T19:36:47.313925000,175.0
2023-12-20T19:40:01.895983000,171.0
2023-12-20T19:50:02.049567000,167.0
2023-12-20T20:00:08.284378000,166.0
2023-12-20T20:10:02.727202000,166.0
2023-12-20T20:40:02.407489000,161.0
2023-12-20T21:10:02.100392000,158.0
2023-12-20T21:21:56.063346000,157.0
2023-12-20T21:30:02.005594000,159.0
2023-12-20T21:40:01.915306000,153.0
2023-12-20T21:50:02.318419000,152.0
2023-12-20T22:00:02.369086000,154.0
2023-12-20T22:10:02.704019000,154.0
2023-12-20T22:20:01.968418000,160.0
2023-12-20T22:30:01.965742000,159.0
2023-12-20T22:40:02.718295000,164.0
2023-12-20T22:50:02.347303000,160.0
2023-12-21T05:00:02.595535000,164.0
2023-12-21T05:10:02.642932000,163.0
2023-12-21T05:20:02.390676000,164.0
2023-12-21T05:30:01.971166000,165.0
2023-12-21T05:40:01.874958000,169.0
2023-12-21T05:50:01.806441000,167.0
2023-12-21T06:00:02.396094000,169.0
2023-12-21T06:10:02.350196000,169.0
2023-12-21T06:20:02.041357000,169.0
2023-12-21T06:33:43.895397000,177.0
2023-12-21T07:30:02.240918000,210.0
2023-12-21T07:47:16.654805000,200.0
2023-12-21T07:50:02.960362000,199.0
2023-12-21T08:10:16.746286000,194.0
2023-12-21T08:20:02.218056000,198.0
2023-12-21T08:30:01.729418000,198.0
2023-12-21T08:40:02.345477000,194.0
2023-12-21T08:50:01.464156000,190.0
2023-12-21T09:00:02.476057000,188.0
2023-12-21T09:10:02.130653000,213.0
2023-12-21T09:20:02.364758000,188.0
2023-12-21T09:30:02.499917000,188.0
2023-12-21T09:40:01.911754000,188.0
2023-12-21T09:50:01.885705000,197.0
2023-12-21T10:00:01.633757000,198.0
2023-12-21T10:10:02.531765000,200.0
2023-12-21T10:20:01.685657000,221.0
2023-12-21T10:30:01.567600000,207.0
2023-12-21T10:40:02.279429000,203.0
2023-12-21T10:50:02.548892000,191.0
2023-12-21T11:00:01.622794000,219.0
2023-12-21T11:10:01.435424000,200.0
2023-12-21T11:20:01.849114000,234.0
2023-12-21T11:30:02.391425000,222.0
2023-12-21T11:40:01.796607000,191.0
2023-12-21T11:50:01.776906000,205.0
2023-12-21T12:00:02.485984000,239.0
Asked By: Thomas

||

Answers:

You could first use dt.combine to make a column which has all times on the same day

Then, use dt.truncate and dt.time:

df.with_columns(time=pl.date(2024, 1, 1).dt.combine(pl.col("datetime").dt.time())).sort(
    "time"
).group_by_dynamic("time", every="10m", period="1h").agg(
    pl.col("duration_in_traffic_s").mean()
).with_columns(
    time=pl.col("time").dt.time()
)
Out[26]:
shape: (107, 2)
┌──────────┬───────────────────────┐
│ time     ┆ duration_in_traffic_s │
│ ---      ┆ ---                   │
│ time     ┆ f64                   │
╞══════════╪═══════════════════════╡
│ 04:50:00 ┆ 165.0                 │
│ 05:00:00 ┆ 165.333333            │
│ 05:10:00 ┆ 166.166667            │
│ 05:20:00 ┆ 167.166667            │
│ …        ┆ …                     │
│ 22:20:00 ┆ 160.75                │
│ 22:30:00 ┆ 161.0                 │
│ 22:40:00 ┆ 162.0                 │
│ 22:50:00 ┆ 160.0                 │
└──────────┴───────────────────────┘
Answered By: ignoring_gravity
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.