Polars YYYY week into a date
Question:
Does anyone know how to parse YYYY Week into a date column in Polars?
I have tried this code but it throws an error. Thx
import polars as pl
pl.DataFrame(
{
"week": [201901, 201902, 201903, 201942, 201943, 201944]
}).with_columns(pl.col('week').cast(pl.Utf8).str.strptime(pl.Date, fmt='%Y%U').alias("date"))
Answers:
This seems like a bug (although one with the underlying rust package chrono rather than polars itself). I tried using base python’s strptime and it ignores the %U
and just gives the first of the year for all cases so you can either do string manipulation and math like this (assuming you don’t need an exact response)
pl.DataFrame({
"week": [201901, 201902, 201903, 201942, 201943, 201944]
})
.with_columns(pl.col('week').cast(pl.Utf8))
.with_columns([pl.col('week').str.slice(0,4).cast(pl.Int32).alias('year'),
pl.col('week').str.slice(4,2).cast(pl.Int32).alias('week')])
.select(pl.date(pl.col('year'),1,1) + pl.duration(days=(pl.col('week')-1)*7).alias('date'))
If you look at the definition of %U, it’s supposed to be based the xth Sunday of the year whereas my math is just multiplying by 7.
Another approach is to make a df of dates, then make the strftime of them and then join the dfs. So that might be like this:
dfdates=pl.DataFrame({'date':pl.date_range(datetime(2019,1,1), datetime(2019,12,31),'1d').cast(pl.Date())})
.with_columns(pl.col('date').dt.strftime("%Y%U").alias('week'))
.groupby('week').agg(pl.col('date').min())
And then joining it with what you have
pl.DataFrame({
"week": [201901, 201902, 201903, 201942, 201943, 201944]
}).with_columns(pl.col('week').cast(pl.Utf8())).join(dfdates, on='week')
shape: (6, 2)
┌────────┬────────────┐
│ week ┆ date │
│ --- ┆ --- │
│ str ┆ date │
╞════════╪════════════╡
│ 201903 ┆ 2019-01-20 │
│ 201944 ┆ 2019-11-03 │
│ 201902 ┆ 2019-01-13 │
│ 201943 ┆ 2019-10-27 │
│ 201942 ┆ 2019-10-20 │
│ 201901 ┆ 2019-01-06 │
└────────┴────────────┘
That’s really weird mate, looks like only dates on 2019 are broken, take a look at my example bellow:
pl.DataFrame(
{
"week": [
202201,
202202,
202203,
202242,
202243,
202244,
202101,
202102,
202103,
202142,
202143,
202144,
201901,
201902,
201903,
201942,
201943,
201944,
201801,
201802,
201803,
201842,
201843,
201844,
]
}
).with_columns(pl.format("{}0", "week")).with_columns(
pl.col("week").str.strptime(pl.Date, fmt="%Y%W%w", strict=False).alias("teste")
)
shape: (24, 2)
┌─────────┬────────────┐
│ week ┆ teste │
│ --- ┆ --- │
│ str ┆ date │
╞═════════╪════════════╡
│ 2022010 ┆ 2022-01-09 │
│ 2022020 ┆ 2022-01-16 │
│ 2022030 ┆ 2022-01-23 │
│ 2022420 ┆ 2022-10-23 │
│ 2022430 ┆ 2022-10-30 │
│ 2022440 ┆ 2022-11-06 │
│ 2021010 ┆ 2021-01-10 │
│ 2021020 ┆ 2021-01-17 │
│ 2021030 ┆ 2021-01-24 │
│ 2021420 ┆ 2021-10-24 │
│ 2021430 ┆ 2021-10-31 │
│ 2021440 ┆ 2021-11-07 │
│ 2019010 ┆ null │
│ 2019020 ┆ null │
│ 2019030 ┆ null │
│ 2019420 ┆ null │
│ 2019430 ┆ null │
│ 2019440 ┆ null │
│ 2018010 ┆ 2018-01-07 │
│ 2018020 ┆ 2018-01-14 │
│ 2018030 ┆ 2018-01-21 │
│ 2018420 ┆ 2018-10-21 │
│ 2018430 ┆ 2018-10-28 │
│ 2018440 ┆ 2018-11-04 │
└─────────┴────────────┘
Besides the bug I always use the following expression to parse week counts to proper dates
.with_columns(pl.format("{}0", "week")).with_columns(pl.col("week").str.strptime(pl.Date, fmt="%Y%W%w", strict=False)
It is important take note that is necessary to concatenate a weekday, to really parse this pattern, I think this is mentioned on the other post comments.
Does anyone know how to parse YYYY Week into a date column in Polars?
I have tried this code but it throws an error. Thx
import polars as pl
pl.DataFrame(
{
"week": [201901, 201902, 201903, 201942, 201943, 201944]
}).with_columns(pl.col('week').cast(pl.Utf8).str.strptime(pl.Date, fmt='%Y%U').alias("date"))
This seems like a bug (although one with the underlying rust package chrono rather than polars itself). I tried using base python’s strptime and it ignores the %U
and just gives the first of the year for all cases so you can either do string manipulation and math like this (assuming you don’t need an exact response)
pl.DataFrame({
"week": [201901, 201902, 201903, 201942, 201943, 201944]
})
.with_columns(pl.col('week').cast(pl.Utf8))
.with_columns([pl.col('week').str.slice(0,4).cast(pl.Int32).alias('year'),
pl.col('week').str.slice(4,2).cast(pl.Int32).alias('week')])
.select(pl.date(pl.col('year'),1,1) + pl.duration(days=(pl.col('week')-1)*7).alias('date'))
If you look at the definition of %U, it’s supposed to be based the xth Sunday of the year whereas my math is just multiplying by 7.
Another approach is to make a df of dates, then make the strftime of them and then join the dfs. So that might be like this:
dfdates=pl.DataFrame({'date':pl.date_range(datetime(2019,1,1), datetime(2019,12,31),'1d').cast(pl.Date())})
.with_columns(pl.col('date').dt.strftime("%Y%U").alias('week'))
.groupby('week').agg(pl.col('date').min())
And then joining it with what you have
pl.DataFrame({
"week": [201901, 201902, 201903, 201942, 201943, 201944]
}).with_columns(pl.col('week').cast(pl.Utf8())).join(dfdates, on='week')
shape: (6, 2)
┌────────┬────────────┐
│ week ┆ date │
│ --- ┆ --- │
│ str ┆ date │
╞════════╪════════════╡
│ 201903 ┆ 2019-01-20 │
│ 201944 ┆ 2019-11-03 │
│ 201902 ┆ 2019-01-13 │
│ 201943 ┆ 2019-10-27 │
│ 201942 ┆ 2019-10-20 │
│ 201901 ┆ 2019-01-06 │
└────────┴────────────┘
That’s really weird mate, looks like only dates on 2019 are broken, take a look at my example bellow:
pl.DataFrame(
{
"week": [
202201,
202202,
202203,
202242,
202243,
202244,
202101,
202102,
202103,
202142,
202143,
202144,
201901,
201902,
201903,
201942,
201943,
201944,
201801,
201802,
201803,
201842,
201843,
201844,
]
}
).with_columns(pl.format("{}0", "week")).with_columns(
pl.col("week").str.strptime(pl.Date, fmt="%Y%W%w", strict=False).alias("teste")
)
shape: (24, 2)
┌─────────┬────────────┐
│ week ┆ teste │
│ --- ┆ --- │
│ str ┆ date │
╞═════════╪════════════╡
│ 2022010 ┆ 2022-01-09 │
│ 2022020 ┆ 2022-01-16 │
│ 2022030 ┆ 2022-01-23 │
│ 2022420 ┆ 2022-10-23 │
│ 2022430 ┆ 2022-10-30 │
│ 2022440 ┆ 2022-11-06 │
│ 2021010 ┆ 2021-01-10 │
│ 2021020 ┆ 2021-01-17 │
│ 2021030 ┆ 2021-01-24 │
│ 2021420 ┆ 2021-10-24 │
│ 2021430 ┆ 2021-10-31 │
│ 2021440 ┆ 2021-11-07 │
│ 2019010 ┆ null │
│ 2019020 ┆ null │
│ 2019030 ┆ null │
│ 2019420 ┆ null │
│ 2019430 ┆ null │
│ 2019440 ┆ null │
│ 2018010 ┆ 2018-01-07 │
│ 2018020 ┆ 2018-01-14 │
│ 2018030 ┆ 2018-01-21 │
│ 2018420 ┆ 2018-10-21 │
│ 2018430 ┆ 2018-10-28 │
│ 2018440 ┆ 2018-11-04 │
└─────────┴────────────┘
Besides the bug I always use the following expression to parse week counts to proper dates
.with_columns(pl.format("{}0", "week")).with_columns(pl.col("week").str.strptime(pl.Date, fmt="%Y%W%w", strict=False)
It is important take note that is necessary to concatenate a weekday, to really parse this pattern, I think this is mentioned on the other post comments.