How to filter a polars dataframe by date?

Question:

df.filter(pl.col("MyDate") >= "2020-01-01")

does not work like it does in pandas.

I found a workaround

df.filter(pl.col("MyDate") >= pl.datetime(2020,1,1))

but this does not solve a problem if I need to use string variables.

Asked By: keiv.fly

||

Answers:

You can use python datetime objects. They will be converted to polars literal expressions.

import polars as pl
from datetime import datetime

pl.DataFrame({
    "dates": [datetime(2021, 1, 1), datetime(2021, 1, 2), datetime(2021, 1, 3)],
    "vals": range(3)
}).filter(pl.col("dates") > datetime(2021, 1, 2))

Or in explicit syntax: pl.col("dates") > pl.lit(datetime(2021, 1, 2))

Answered By: ritchie46

Use pl.lit(my_date_str).str.strptime(pl.Date, fmt=my_date_fmt))

Building on the example above:

import polars as pl
from datetime import datetime

df=pl.DataFrame({
    "dates": [datetime(2021, 1, 1), datetime(2021, 1, 2), datetime(2021, 1, 3)],
    "vals": range(3)
})

my_date_str="2021-01-02"
my_date_fmt="%F"
df.filter(pl.col('dates') >= pl.lit(my_date_str).str.strptime(pl.Date, fmt=my_date_fmt))
shape: (2, 2)
┌─────────────────────┬──────┐
│ dates               ┆ vals │
│ ---                 ┆ ---  │
│ datetime[μs]        ┆ i64  │
╞═════════════════════╪══════╡
│ 2021-01-02 00:00:00 ┆ 1    │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2021-01-03 00:00:00 ┆ 2    │
└─────────────────────┴──────┘

Just be sure to match the format to your date string. For example,

my_date_str="01/02/21"
my_date_fmt="%D"

I can’t speak to the performance of this approach, but it provides an easy way to incorporate string variables into your code.

Answered By: user18263465

Hacky workaround for slightly neater code: Just use pandas!

pd.to_datetime takes a single string, and from testing with my own data as well as your example polars is very happy to work with the pandas datetime object it returns.

If importing from pandas just isn’t possible for you then this is useless, but if you want unfussy string to date conversion … why not use pandas for what it’s good at? 😛

import polars as pl
from datetime import datetime
from pandas import to_datetime # or just import pandas as pd

df = pl.DataFrame({
    "dates": [datetime(2021, 1, 1), datetime(2021, 1, 2), datetime(2021, 1, 3)],
    "vals": range(3)
})

my_date_str = "2021-01-02"
my_date = to_datetime(my_date_str) # or use pd.to_datetime
print(df.filter(pl.col('dates') >= my_date))

which produces:

shape: (2, 2)
┌─────────────────────┬──────┐
│ dates               ┆ vals │
│ ---                 ┆ ---  │
│ datetime[μs]        ┆ i64  │
╞═════════════════════╪══════╡
│ 2021-01-02 00:00:00 ┆ 1    │
│ 2021-01-03 00:00:00 ┆ 2    │
└─────────────────────┴──────┘
Answered By: stephan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.