Polars add/substract utc offset from datetime object

Question:

I wanted to add/subtract the UTC offset (usually in hours) to/from the datetime object in polars but I don’t seem to see a way to do this. the UTC offset can be dynamic given there’s Day Light Saving period comes into play in a calendar year. (e.g., EST/EDT maps to 5/4 hours of UTC offset respectively).

from datetime import datetime
import pytz
import polars as pl
from datetime import date

# Make a datetime-only dataframe that covers DST period of year, in UTC time first.
 df = pl.DataFrame(
         pl.date_range(low=date(2022,1,3), 
                       high=date(2022,9,30), 
                       interval="5m", 
                       time_unit="ns", 
                       time_zone="UTC")
           .alias("timestamp")
      )

 # Convert timezone to "America/New_York", which covers both EST and EDT.
 us_df = df.with_column(
                        pl.col("timestamp")
                          .dt
                          .cast_time_zone(tz="America/New_York")
                          .alias("datetime")
         )

 # Check us_df output
 us_df
 # output, here `polars` is showing US time without the UTC offset 
 # Before 0.14.22 `polars` is showing time with UTC offset
 # i.e., `23:45:00 UTC` should be `19:45:00 EDT`
 # Now `polars` is showing `15:45:00 EDT`, without 4 hours of offset
┌─────────────────────────┬────────────────────────────────┐
│ timestamp               ┆ datetime                       │
│ ---                     ┆ ---                            │
│ datetime[ns, UTC]       ┆ datetime[ns, America/New_York] │
╞═════════════════════════╪════════════════════════════════╡
│ 2022-01-03 00:00:00 UTC ┆ 2022-01-02 14:00:00 EST        │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-01-03 00:05:00 UTC ┆ 2022-01-02 14:05:00 EST        │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-01-03 00:10:00 UTC ┆ 2022-01-02 14:10:00 EST        │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-01-03 00:15:00 UTC ┆ 2022-01-02 14:15:00 EST        │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ...                     ┆ ...                            │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-09-29 23:45:00 UTC ┆ 2022-09-29 15:45:00 EDT        │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-09-29 23:50:00 UTC ┆ 2022-09-29 15:50:00 EDT        │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-09-29 23:55:00 UTC ┆ 2022-09-29 15:55:00 EDT        │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-09-30 00:00:00 UTC ┆ 2022-09-29 16:00:00 EDT        │
└─────────────────────────┴────────────────────────────────┘

Converting to_pandas, we should observe that the underlying datetime object does not include that 4 hours of offset in the actual time as well (remember EST is also in this dataframe, and it has a 5-hour offset).

 # Convert to pandas
 us_pd = us_df.to_pandas()
 us_pd
 # output

                      timestamp                  datetime
0     2022-01-03 00:00:00+00:00 2022-01-02 14:00:00-05:00
1     2022-01-03 00:05:00+00:00 2022-01-02 14:05:00-05:00
2     2022-01-03 00:10:00+00:00 2022-01-02 14:10:00-05:00
3     2022-01-03 00:15:00+00:00 2022-01-02 14:15:00-05:00
4     2022-01-03 00:20:00+00:00 2022-01-02 14:20:00-05:00
...                         ...                       ...
77756 2022-09-29 23:40:00+00:00 2022-09-29 15:40:00-04:00
77757 2022-09-29 23:45:00+00:00 2022-09-29 15:45:00-04:00
77758 2022-09-29 23:50:00+00:00 2022-09-29 15:50:00-04:00
77759 2022-09-29 23:55:00+00:00 2022-09-29 15:55:00-04:00
77760 2022-09-30 00:00:00+00:00 2022-09-29 16:00:00-04:00

What I wanted was to include the UTC offset into the actual time, such that I can do filtering on the time (in a natural way). For instance, if I am seeing 2300UTC is 1900EDT, I can filter using 1900 directly (please note I can’t just add/substract the UTC offset on the fly during filtering, as the number of hours is dynamic given DST).

The underlying python datetime does have utcoffset function, which can be applied on each datetime object, but I’d need to convert polars to pandas first (I don’t see how to do this within polars).

I’ve also observed this peculiar difference:

  us_pd.datetime[us_pd.shape[0]-1].to_pydatetime()

  # We can see it is identical to what's already in `polars` and `pandas` dataframe.

  datetime.datetime(2022, 9, 29, 16, 0, tzinfo=<DstTzInfo 'America/New_York' EDT-1 day, 20:00:00 DST>)

  # Now we create a single datetime object with arbitrary UTC time and convert it to New York time

  datetime(2022, 9, 30, 22, 45, 0,0, pytz.utc).astimezone(pytz.timezone("America/New_York"))

  # The representation here is actually the correct New York time (as in, the offset has been included)

  datetime.datetime(2022, 9, 30, 18, 45, tzinfo=<DstTzInfo 'America/New_York' EDT-1 day, 20:00:00 DST>)
  
  
Asked By: stucash

||

Answers:

py-polars 0.16.3 update: It seems you’re looking for convert_time_zone. Ex:

from datetime import date
import polars as pl

df = pl.DataFrame(
    pl.date_range(
        low=date(2022, 1, 3),
        high=date(2022, 9, 30),
        interval="5m",
        time_unit="ns",
        time_zone="UTC",
    ).alias("timestamp")
)

us_df = df.with_columns(
    pl.col("timestamp").dt.convert_time_zone(time_zone="America/New_York").alias("datetime")
)


┌─────────────────────────┬────────────────────────────────┐
│ timestamp               ┆ datetime                       │
│ ---                     ┆ ---                            │
│ datetime[ns, UTC]       ┆ datetime[ns, America/New_York] │
╞═════════════════════════╪════════════════════════════════╡
│ 2022-01-03 00:00:00 UTC ┆ 2022-01-02 19:00:00 EST        │
│ 2022-01-03 00:05:00 UTC ┆ 2022-01-02 19:05:00 EST        │
│ 2022-01-03 00:10:00 UTC ┆ 2022-01-02 19:10:00 EST        │
│ 2022-01-03 00:15:00 UTC ┆ 2022-01-02 19:15:00 EST        │
│ ...                     ┆ ...                            │
│ 2022-09-29 23:45:00 UTC ┆ 2022-09-29 19:45:00 EDT        │
│ 2022-09-29 23:50:00 UTC ┆ 2022-09-29 19:50:00 EDT        │
│ 2022-09-29 23:55:00 UTC ┆ 2022-09-29 19:55:00 EDT        │
│ 2022-09-30 00:00:00 UTC ┆ 2022-09-29 20:00:00 EDT        │
└─────────────────────────┴────────────────────────────────┘
Answered By: FObersteiner

Update – this response is out of date, see the answer using .dt.replace_time_zone above

Answered By: braaannigan