Easily convert string column to pl.datetime in Polars
Question:
Consider a Polars data frame with a column of str
type that indicates the date in the format '27 July 2020'
. I would like to convert this column to the polars.datetime
type, which is distinct from the Python standard datetime
. The following code, using the standard datetime
format, works but Polars does not recognise the values in the column as dates.
import polars as pl
from datetime import datetime
df = pd.read_csv('<some CSV file containing a column called 'event_date'>')
df = df.with_columns([
pl.col('event_date').apply(lambda x: x.replace(" ","-"))
.apply(lambda x: datetime.strptime(x, '%d-%B-%Y'))
])
Suppose we try to process df
further to create a new column indicating the quarter of the year an event took place.
df = df.with_columns([
pl.col('event_date').apply(lambda x: x.month)
.apply(lambda x: 1 if x in range(1,4) else 2 if x in range(4,7) else 3 if x in range(7,10) else 4)
.alias('quarter')
])
The code returns the following error because it qualifies event_type
as dtype Object("object")
and not as datetime
or polars.datetime
thread '<unnamed>' panicked at 'dtype Object("object") not supported', src/series.rs:992:24
--- PyO3 is resuming a panic after fetching a PanicException from Python. ---
PanicException: Unwrapped panic from Python code
Answers:
The easiest way to convert strings to Date/Datetime is to use Polars’ own strptime
function (rather than the same-named function from Python’s datetime
module).
For example, let’s start with this data.
import polars as pl
df = pl.DataFrame({
'date_str': ["27 July 2020", "31 December 2020"]
})
print(df)
shape: (2, 1)
┌──────────────────┐
│ date_str │
│ --- │
│ str │
╞══════════════════╡
│ 27 July 2020 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 31 December 2020 │
└──────────────────┘
To convert, use Polars’ strptime function.
df.with_column(pl.col('date_str').str.strptime(pl.Date, fmt='%d %B %Y').cast(pl.Datetime))
shape: (2, 1)
┌─────────────────────┐
│ date_str │
│ --- │
│ datetime[μs] │
╞═════════════════════╡
│ 2020-07-27 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2020-12-31 00:00:00 │
└─────────────────────┘
Notice that we did not need to replace spaces with dashes. I’ve cast the result as a Datetime (per your question), but you may be able to use a Date instead.
Currently, the apply
method does not work when the return type is a python Date/Datetime object, but there is a request for this. That said, it’s better to use Polars’ strptime
. It will be much faster than calling python datetime
code.
Edit: as of Polars 0.13.19
, the apply
method will automatically convert Python date/datetime to Polars Date/Datetime.
To change a string column to datetime in polars
, use str.strptime().
import polars as pl
df = pl.DataFrame(df_pandas)
df
shape: (100, 2)
┌────────────┬────────┐
│ dates_col ┆ ticker │
│ --- ┆ --- │
│ str ┆ str │
╞════════════╪════════╡
│ 2022-02-25 ┆ RDW │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2008-05-28 ┆ ARTX │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2015-05-21 ┆ CBAT │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2009-02-09 ┆ ANNB │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
df.with_column(pl.col("dates_col").str.strptime(pl.Datetime, fmt="%Y-%m-%d").cast(pl.Datetime))
shape: (100, 2)
┌─────────────────────┬────────┐
│ dates_col ┆ ticker │
│ --- ┆ --- │
│ datetime[μs] ┆ str │
╞═════════════════════╪════════╡
│ 2022-02-25 00:00:00 ┆ RDW │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2008-05-28 00:00:00 ┆ ARTX │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2015-05-21 00:00:00 ┆ CBAT │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2009-02-09 00:00:00 ┆ ANNB │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
Consider a Polars data frame with a column of str
type that indicates the date in the format '27 July 2020'
. I would like to convert this column to the polars.datetime
type, which is distinct from the Python standard datetime
. The following code, using the standard datetime
format, works but Polars does not recognise the values in the column as dates.
import polars as pl
from datetime import datetime
df = pd.read_csv('<some CSV file containing a column called 'event_date'>')
df = df.with_columns([
pl.col('event_date').apply(lambda x: x.replace(" ","-"))
.apply(lambda x: datetime.strptime(x, '%d-%B-%Y'))
])
Suppose we try to process df
further to create a new column indicating the quarter of the year an event took place.
df = df.with_columns([
pl.col('event_date').apply(lambda x: x.month)
.apply(lambda x: 1 if x in range(1,4) else 2 if x in range(4,7) else 3 if x in range(7,10) else 4)
.alias('quarter')
])
The code returns the following error because it qualifies event_type
as dtype Object("object")
and not as datetime
or polars.datetime
thread '<unnamed>' panicked at 'dtype Object("object") not supported', src/series.rs:992:24
--- PyO3 is resuming a panic after fetching a PanicException from Python. ---
PanicException: Unwrapped panic from Python code
The easiest way to convert strings to Date/Datetime is to use Polars’ own strptime
function (rather than the same-named function from Python’s datetime
module).
For example, let’s start with this data.
import polars as pl
df = pl.DataFrame({
'date_str': ["27 July 2020", "31 December 2020"]
})
print(df)
shape: (2, 1)
┌──────────────────┐
│ date_str │
│ --- │
│ str │
╞══════════════════╡
│ 27 July 2020 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 31 December 2020 │
└──────────────────┘
To convert, use Polars’ strptime function.
df.with_column(pl.col('date_str').str.strptime(pl.Date, fmt='%d %B %Y').cast(pl.Datetime))
shape: (2, 1)
┌─────────────────────┐
│ date_str │
│ --- │
│ datetime[μs] │
╞═════════════════════╡
│ 2020-07-27 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2020-12-31 00:00:00 │
└─────────────────────┘
Notice that we did not need to replace spaces with dashes. I’ve cast the result as a Datetime (per your question), but you may be able to use a Date instead.
Currently, the apply
method does not work when the return type is a python Date/Datetime object, but there is a request for this. That said, it’s better to use Polars’ strptime
. It will be much faster than calling python datetime
code.
Edit: as of Polars 0.13.19
, the apply
method will automatically convert Python date/datetime to Polars Date/Datetime.
To change a string column to datetime in polars
, use str.strptime().
import polars as pl
df = pl.DataFrame(df_pandas)
df
shape: (100, 2)
┌────────────┬────────┐
│ dates_col ┆ ticker │
│ --- ┆ --- │
│ str ┆ str │
╞════════════╪════════╡
│ 2022-02-25 ┆ RDW │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2008-05-28 ┆ ARTX │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2015-05-21 ┆ CBAT │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2009-02-09 ┆ ANNB │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
df.with_column(pl.col("dates_col").str.strptime(pl.Datetime, fmt="%Y-%m-%d").cast(pl.Datetime))
shape: (100, 2)
┌─────────────────────┬────────┐
│ dates_col ┆ ticker │
│ --- ┆ --- │
│ datetime[μs] ┆ str │
╞═════════════════════╪════════╡
│ 2022-02-25 00:00:00 ┆ RDW │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2008-05-28 00:00:00 ┆ ARTX │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2015-05-21 00:00:00 ┆ CBAT │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2009-02-09 00:00:00 ┆ ANNB │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤