How to extract date from datetime column in polars

Question:

I am trying to move from pandas to polars but I am running into the following issue.

import polars as pl

df = pl.DataFrame(
    {
        "integer": [1, 2, 3], 
        "date": [
            "2010-01-31T23:00:00+00:00",
            "2010-02-01T00:00:00+00:00",
            "2010-02-01T01:00:00+00:00"
        ]
    }
)
df = df.with_columns(
    [
        pl.col("date").str.strptime(pl.Datetime, fmt="%Y-%m-%dT%H:%M:%S%z").dt.with_time_zone("Europe/Amsterdam"),
    ]
)

Yields the following dataframe:

>>> df
shape: (3, 2)
┌─────────┬────────────────────────────────┐
│ integer ┆ date                           │
│ ---     ┆ ---                            │
│ i64     ┆ datetime[μs, Europe/Amsterdam] │
╞═════════╪════════════════════════════════╡
│ 1       ┆ 2010-02-01 00:00:00 CET        │
│ 2       ┆ 2010-02-01 01:00:00 CET        │
│ 3       ┆ 2010-02-01 02:00:00 CET        │
└─────────┴────────────────────────────────┘

As you can see, I transformed the datetime string from UTC to CET succesfully. However, when I try to extract the date (using the accepted answer by the polars author in this thread: https://stackoverflow.com/a/73212748/16332690), it seems to extract the date from the UTC string even though it has been transformed, e.g.:

df = df.with_columns(
    [
        pl.col("date").cast(pl.Date).alias("valueDay")
    ]
)
>>> df
shape: (3, 3)
┌─────────┬────────────────────────────────┬────────────┐
│ integer ┆ date                           ┆ valueDay   │
│ ---     ┆ ---                            ┆ ---        │
│ i64     ┆ datetime[μs, Europe/Amsterdam] ┆ date       │
╞═════════╪════════════════════════════════╪════════════╡
│ 1       ┆ 2010-02-01 00:00:00 CET        ┆ 2010-01-31 │
│ 2       ┆ 2010-02-01 01:00:00 CET        ┆ 2010-02-01 │
│ 3       ┆ 2010-02-01 02:00:00 CET        ┆ 2010-02-01 │
└─────────┴────────────────────────────────┴────────────┘

The valueDay should be 2010-02-01 for all 3 values.

Can anyone help me fix this? By the way, what is the best way to optimize this code? Do I constantly have to assign everything to df or is there a way to chain all of this?

Edit:

I managed to find a quick way around this but it would be nice if the issue above could be addressed. A pandas dt.date like way to approach this would be nice, I noticed that it is missing over here: https://pola-rs.github.io/polars/py-polars/html/reference/series/timeseries.html

df = pl.DataFrame(
    {
        "integer": [1, 2, 3], 
        "date": [
            "2010-01-31T23:00:00+00:00",
            "2010-02-01T00:00:00+00:00",
            "2010-02-01T01:00:00+00:00"
        ]
    }
)
df = df.with_columns(
    [
        pl.col("date").str.strptime(pl.Datetime, fmt="%Y-%m-%dT%H:%M:%S%z").dt.with_time_zone("Europe/Amsterdam"),
    ]
)
df = df.with_columns(
    [
        pl.col("date").dt.day().alias("day"),
        pl.col("date").dt.month().alias("month"),
        pl.col("date").dt.year().alias("year"),
    ]
)
df = df.with_columns(
    pl.datetime(year=pl.col("year"), month=pl.col("month"), day=pl.col("day"))
)
df = df.with_columns(
    [
        pl.col("datetime").cast(pl.Date).alias("valueDay")
    ]
)

Yields the following:

>>> df
shape: (3, 7)
┌─────────┬────────────────────────────────┬─────┬───────┬──────┬─────────────────────┬────────────┐
│ integer ┆ date                           ┆ day ┆ month ┆ year ┆ datetime            ┆ valueDay   │
│ ---     ┆ ---                            ┆ --- ┆ ---   ┆ ---  ┆ ---                 ┆ ---        │
│ i64     ┆ datetime[μs, Europe/Amsterdam] ┆ u32 ┆ u32   ┆ i32  ┆ datetime[μs]        ┆ date       │
╞═════════╪════════════════════════════════╪═════╪═══════╪══════╪═════════════════════╪════════════╡
│ 1       ┆ 2010-02-01 00:00:00 CET        ┆ 1   ┆ 2     ┆ 2010 ┆ 2010-02-01 00:00:00 ┆ 2010-02-01 │
│ 2       ┆ 2010-02-01 01:00:00 CET        ┆ 1   ┆ 2     ┆ 2010 ┆ 2010-02-01 00:00:00 ┆ 2010-02-01 │
│ 3       ┆ 2010-02-01 02:00:00 CET        ┆ 1   ┆ 2     ┆ 2010 ┆ 2010-02-01 00:00:00 ┆ 2010-02-01 │
└─────────┴────────────────────────────────┴─────┴───────┴──────┴─────────────────────┴────────────┘
Asked By: brokkoo

||

Answers:

There’s nothing wrong with the cast function per se. It’s just not intended to be used for truncating the time out of a datetime.

As it happens the function you’re looking for is truncate so this is what you want to do (in the last with_columns chunk)

pl.DataFrame(
    {
        "integer": [1, 2, 3], 
        "date": [
            "2010-01-31T23:00:00+00:00",
            "2010-02-01T00:00:00+00:00",
            "2010-02-01T01:00:00+00:00"
        ]
    }
).with_columns(
    [
        pl.col("date").str.strptime(pl.Datetime, fmt="%Y-%m-%dT%H:%M:%S%z").dt.with_time_zone("Europe/Amsterdam"),
    ]
).with_columns(
    [
        pl.col("date").dt.truncate('1d').alias("valueDay")
    ]
)
Answered By: Dean MacGregor

For completeness, in polars 0.16.15+, dt namespace methods .date(), .time() etc. return local datetime attributes. EX:

import polars as pl

df = pl.DataFrame(
    {
        "integer": [1, 2, 3],
        "date": [
            "2010-01-31T23:00:00+00:00",
            "2010-02-01T00:00:00+00:00",
            "2010-02-01T01:00:00+00:00",
        ],
    }
)
df = df.with_columns(pl.col("date").str.strptime(pl.Datetime, fmt="%+")
                     .dt.convert_time_zone("Europe/Amsterdam"))

df = df.with_columns(
    [
        pl.col("date").dt.day().alias("day"),
        pl.col("date").dt.month().alias("month"),
        pl.col("date").dt.year().alias("year"),
    ]
)

print(df)

┌─────────┬────────────────────────────────┬─────┬───────┬──────┐
│ integer ┆ date                           ┆ day ┆ month ┆ year │
│ ---     ┆ ---                            ┆ --- ┆ ---   ┆ ---  │
│ i64     ┆ datetime[μs, Europe/Amsterdam] ┆ u32 ┆ u32   ┆ i32  │
╞═════════╪════════════════════════════════╪═════╪═══════╪══════╡
│ 1       ┆ 2010-02-01 00:00:00 CET        ┆ 1   ┆ 2     ┆ 2010 │
│ 2       ┆ 2010-02-01 01:00:00 CET        ┆ 1   ┆ 2     ┆ 2010 │
│ 3       ┆ 2010-02-01 02:00:00 CET        ┆ 1   ┆ 2     ┆ 2010 │
└─────────┴────────────────────────────────┴─────┴───────┴──────┘
Answered By: FObersteiner