Easily convert string column to pl.datetime in Polars

Question:

Consider a Polars data frame with a column of str type that indicates the date in the format '27 July 2020'. I would like to convert this column to the polars.datetime type, which is distinct from the Python standard datetime. The following code, using the standard datetime format, works but Polars does not recognise the values in the column as dates.

import polars as pl
from datetime import datetime

df = pd.read_csv('<some CSV file containing a column called 'event_date'>')
df = df.with_columns([   
        pl.col('event_date').apply(lambda x: x.replace(" ","-"))
                            .apply(lambda x: datetime.strptime(x, '%d-%B-%Y'))
])

Suppose we try to process df further to create a new column indicating the quarter of the year an event took place.

df = df.with_columns([
        pl.col('event_date').apply(lambda x: x.month)
                            .apply(lambda x: 1 if x in range(1,4) else 2 if x in range(4,7) else 3 if x in range(7,10) else 4)
                            .alias('quarter')
])

The code returns the following error because it qualifies event_type as dtype Object("object") and not as datetime or polars.datetime

thread '<unnamed>' panicked at 'dtype Object("object") not supported', src/series.rs:992:24
--- PyO3 is resuming a panic after fetching a PanicException from Python. ---
PanicException: Unwrapped panic from Python code
Asked By: fabioklr

||

Answers:

The easiest way to convert strings to Date/Datetime is to use Polars’ own strptime function (rather than the same-named function from Python’s datetime module).

For example, let’s start with this data.

import polars as pl

df = pl.DataFrame({
    'date_str': ["27 July 2020", "31 December 2020"]
})
print(df)
shape: (2, 1)
┌──────────────────┐
│ date_str         │
│ ---              │
│ str              │
╞══════════════════╡
│ 27 July 2020     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 31 December 2020 │
└──────────────────┘

To convert, use Polars’ strptime function.

df.with_column(pl.col('date_str').str.strptime(pl.Date, fmt='%d %B %Y').cast(pl.Datetime))
shape: (2, 1)
┌─────────────────────┐
│ date_str            │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2020-07-27 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2020-12-31 00:00:00 │
└─────────────────────┘

Notice that we did not need to replace spaces with dashes. I’ve cast the result as a Datetime (per your question), but you may be able to use a Date instead.

Currently, the apply method does not work when the return type is a python Date/Datetime object, but there is a request for this. That said, it’s better to use Polars’ strptime. It will be much faster than calling python datetime code.

Edit: as of Polars 0.13.19, the apply method will automatically convert Python date/datetime to Polars Date/Datetime.

Answered By: cbilot

To change a string column to datetime in polars, use str.strptime().

import polars as pl
df = pl.DataFrame(df_pandas)

df

shape: (100, 2)
┌────────────┬────────┐
│ dates_col  ┆ ticker │
│ ---        ┆ ---    │
│ str        ┆ str    │
╞════════════╪════════╡
│ 2022-02-25 ┆ RDW    │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2008-05-28 ┆ ARTX   │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2015-05-21 ┆ CBAT   │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2009-02-09 ┆ ANNB   │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤


df.with_column(pl.col("dates_col").str.strptime(pl.Datetime, fmt="%Y-%m-%d").cast(pl.Datetime))

shape: (100, 2)
┌─────────────────────┬────────┐
│ dates_col           ┆ ticker │
│ ---                 ┆ ---    │
│ datetime[μs]        ┆ str    │
╞═════════════════════╪════════╡
│ 2022-02-25 00:00:00 ┆ RDW    │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2008-05-28 00:00:00 ┆ ARTX   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2015-05-21 00:00:00 ┆ CBAT   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2009-02-09 00:00:00 ┆ ANNB   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
Answered By: Artur Dutra
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.