polars – Fill null over Groups

Question:

I am trying to fill null timestamps over groups, my dataframe looks like this

start stop group
0 2021-12-08 06:40:53.734941+01:00 2022-05-16 10:16:18.717146+02:00 1
1 2021-12-08 06:40:55.191598+01:00 null 1
2 2021-12-08 10:39:12.421402+01:00 2022-05-16 10:16:19.816922+02:00 2
3 2021-12-08 10:39:12.634873+01:00 2022-05-16 10:16:19.817304+02:00 1
4 2021-12-08 10:49:47.392815+01:00 2022-05-16 10:16:20.178050+02:00 5

The stop timestamps should be imputed from the next start.
My current implantation is:

   df = df.sort("start").with_columns(
       pl.col("start").fill_null(strategy="backward").over("group")
   )
   df = df.groupby("group").apply(
       lambda df: df.with_columns(
           pl.col("stop").fill_null(
               pl.col("start").shift(-1)
           )
       ),
       schema=None,
   )

This current implementation is slow. Is there a faster way to solve this ?

Thanks in advance

Asked By: linus heinz

||

Answers:

You can use pl.coalesce().

Here is an example:

df.with_columns(
    stop = pl.coalesce(
        pl.col('stop'), pl.col('start').shift(-1).over('group')
    )
)

shape: (5, 4)
┌─────┬──────────────────────────────────┬──────────────────────────────────┬───────┐
│     ┆ start                            ┆ stop                             ┆ group │
│ --- ┆ ---                              ┆ ---                              ┆ ---   │
│ i64 ┆ str                              ┆ str                              ┆ i64   │
╞═════╪══════════════════════════════════╪══════════════════════════════════╪═══════╡
│ 0   ┆ 2021-12-08 06:40:53.734941+01:00 ┆ 2022-05-16 10:16:18.717146+02:00 ┆ 1     │
│ 1   ┆ 2021-12-08 06:40:55.191598+01:00 ┆ 2021-12-08 10:39:12.634873+01:00 ┆ 1     │
│ 2   ┆ 2021-12-08 10:39:12.421402+01:00 ┆ 2022-05-16 10:16:19.816922+02:00 ┆ 2     │
│ 3   ┆ 2021-12-08 10:39:12.634873+01:00 ┆ 2022-05-16 10:16:19.817304+02:00 ┆ 1     │
│ 4   ┆ 2021-12-08 10:49:47.392815+01:00 ┆ 2022-05-16 10:16:20.178050+02:00 ┆ 5     │
└─────┴──────────────────────────────────┴──────────────────────────────────┴───────┘

In the example I used .over() so that it finds the next start within the same group.

Answered By: Luca
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.