Polars YYYY week into a date

Question:

Does anyone know how to parse YYYY Week into a date column in Polars?
I have tried this code but it throws an error. Thx

import polars as pl
pl.DataFrame(
{
 "week": [201901, 201902, 201903, 201942, 201943, 201944]
}).with_columns(pl.col('week').cast(pl.Utf8).str.strptime(pl.Date, fmt='%Y%U').alias("date"))
Asked By: Frank

||

Answers:

This seems like a bug (although one with the underlying rust package chrono rather than polars itself). I tried using base python’s strptime and it ignores the %U and just gives the first of the year for all cases so you can either do string manipulation and math like this (assuming you don’t need an exact response)

pl.DataFrame({
    "week": [201901, 201902, 201903, 201942, 201943, 201944]
}) 
    .with_columns(pl.col('week').cast(pl.Utf8)) 
    .with_columns([pl.col('week').str.slice(0,4).cast(pl.Int32).alias('year'),
                   pl.col('week').str.slice(4,2).cast(pl.Int32).alias('week')]) 
    .select(pl.date(pl.col('year'),1,1) + pl.duration(days=(pl.col('week')-1)*7).alias('date'))

If you look at the definition of %U, it’s supposed to be based the xth Sunday of the year whereas my math is just multiplying by 7.

Another approach is to make a df of dates, then make the strftime of them and then join the dfs. So that might be like this:

dfdates=pl.DataFrame({'date':pl.date_range(datetime(2019,1,1), datetime(2019,12,31),'1d').cast(pl.Date())}) 
        .with_columns(pl.col('date').dt.strftime("%Y%U").alias('week')) 
        .groupby('week').agg(pl.col('date').min())

And then joining it with what you have

pl.DataFrame({
    "week": [201901, 201902, 201903, 201942, 201943, 201944]
}).with_columns(pl.col('week').cast(pl.Utf8())).join(dfdates, on='week')

shape: (6, 2)
┌────────┬────────────┐
│ week   ┆ date       │
│ ---    ┆ ---        │
│ str    ┆ date       │
╞════════╪════════════╡
│ 201903 ┆ 2019-01-20 │
│ 201944 ┆ 2019-11-03 │
│ 201902 ┆ 2019-01-13 │
│ 201943 ┆ 2019-10-27 │
│ 201942 ┆ 2019-10-20 │
│ 201901 ┆ 2019-01-06 │
└────────┴────────────┘
Answered By: Dean MacGregor

That’s really weird mate, looks like only dates on 2019 are broken, take a look at my example bellow:

pl.DataFrame(
    {
        "week": [
            202201,
            202202,
            202203,
            202242,
            202243,
            202244,
            202101,
            202102,
            202103,
            202142,
            202143,
            202144,
            201901,
            201902,
            201903,
            201942,
            201943,
            201944,
            201801,
            201802,
            201803,
            201842,
            201843,
            201844,
        ]
    }
).with_columns(pl.format("{}0", "week")).with_columns(
    pl.col("week").str.strptime(pl.Date, fmt="%Y%W%w", strict=False).alias("teste")
)

shape: (24, 2)
┌─────────┬────────────┐
│ week    ┆ teste      │
│ ---     ┆ ---        │
│ str     ┆ date       │
╞═════════╪════════════╡
│ 2022010 ┆ 2022-01-09 │
│ 2022020 ┆ 2022-01-16 │
│ 2022030 ┆ 2022-01-23 │
│ 2022420 ┆ 2022-10-23 │
│ 2022430 ┆ 2022-10-30 │
│ 2022440 ┆ 2022-11-06 │
│ 2021010 ┆ 2021-01-10 │
│ 2021020 ┆ 2021-01-17 │
│ 2021030 ┆ 2021-01-24 │
│ 2021420 ┆ 2021-10-24 │
│ 2021430 ┆ 2021-10-31 │
│ 2021440 ┆ 2021-11-07 │
│ 2019010 ┆ null       │
│ 2019020 ┆ null       │
│ 2019030 ┆ null       │
│ 2019420 ┆ null       │
│ 2019430 ┆ null       │
│ 2019440 ┆ null       │
│ 2018010 ┆ 2018-01-07 │
│ 2018020 ┆ 2018-01-14 │
│ 2018030 ┆ 2018-01-21 │
│ 2018420 ┆ 2018-10-21 │
│ 2018430 ┆ 2018-10-28 │
│ 2018440 ┆ 2018-11-04 │
└─────────┴────────────┘

Besides the bug I always use the following expression to parse week counts to proper dates

.with_columns(pl.format("{}0", "week")).with_columns(pl.col("week").str.strptime(pl.Date, fmt="%Y%W%w", strict=False)

It is important take note that is necessary to concatenate a weekday, to really parse this pattern, I think this is mentioned on the other post comments.

Answered By: Igor Marcos Riegel
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.