How to filter a polars dataframe by date?
Question:
df.filter(pl.col("MyDate") >= "2020-01-01")
does not work like it does in pandas.
I found a workaround
df.filter(pl.col("MyDate") >= pl.datetime(2020,1,1))
but this does not solve a problem if I need to use string variables.
Answers:
You can use python datetime
objects. They will be converted to polars literal
expressions.
import polars as pl
from datetime import datetime
pl.DataFrame({
"dates": [datetime(2021, 1, 1), datetime(2021, 1, 2), datetime(2021, 1, 3)],
"vals": range(3)
}).filter(pl.col("dates") > datetime(2021, 1, 2))
Or in explicit syntax: pl.col("dates") > pl.lit(datetime(2021, 1, 2))
Use pl.lit(my_date_str).str.strptime(pl.Date, fmt=my_date_fmt))
Building on the example above:
import polars as pl
from datetime import datetime
df=pl.DataFrame({
"dates": [datetime(2021, 1, 1), datetime(2021, 1, 2), datetime(2021, 1, 3)],
"vals": range(3)
})
my_date_str="2021-01-02"
my_date_fmt="%F"
df.filter(pl.col('dates') >= pl.lit(my_date_str).str.strptime(pl.Date, fmt=my_date_fmt))
shape: (2, 2)
┌─────────────────────┬──────┐
│ dates ┆ vals │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪══════╡
│ 2021-01-02 00:00:00 ┆ 1 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2021-01-03 00:00:00 ┆ 2 │
└─────────────────────┴──────┘
Just be sure to match the format to your date string. For example,
my_date_str="01/02/21"
my_date_fmt="%D"
I can’t speak to the performance of this approach, but it provides an easy way to incorporate string variables into your code.
Hacky workaround for slightly neater code: Just use pandas!
pd.to_datetime
takes a single string, and from testing with my own data as well as your example polars is very happy to work with the pandas datetime object it returns.
If importing from pandas just isn’t possible for you then this is useless, but if you want unfussy string to date conversion … why not use pandas for what it’s good at? 😛
import polars as pl
from datetime import datetime
from pandas import to_datetime # or just import pandas as pd
df = pl.DataFrame({
"dates": [datetime(2021, 1, 1), datetime(2021, 1, 2), datetime(2021, 1, 3)],
"vals": range(3)
})
my_date_str = "2021-01-02"
my_date = to_datetime(my_date_str) # or use pd.to_datetime
print(df.filter(pl.col('dates') >= my_date))
which produces:
shape: (2, 2)
┌─────────────────────┬──────┐
│ dates ┆ vals │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪══════╡
│ 2021-01-02 00:00:00 ┆ 1 │
│ 2021-01-03 00:00:00 ┆ 2 │
└─────────────────────┴──────┘
df.filter(pl.col("MyDate") >= "2020-01-01")
does not work like it does in pandas.
I found a workaround
df.filter(pl.col("MyDate") >= pl.datetime(2020,1,1))
but this does not solve a problem if I need to use string variables.
You can use python datetime
objects. They will be converted to polars literal
expressions.
import polars as pl
from datetime import datetime
pl.DataFrame({
"dates": [datetime(2021, 1, 1), datetime(2021, 1, 2), datetime(2021, 1, 3)],
"vals": range(3)
}).filter(pl.col("dates") > datetime(2021, 1, 2))
Or in explicit syntax: pl.col("dates") > pl.lit(datetime(2021, 1, 2))
Use pl.lit(my_date_str).str.strptime(pl.Date, fmt=my_date_fmt))
Building on the example above:
import polars as pl
from datetime import datetime
df=pl.DataFrame({
"dates": [datetime(2021, 1, 1), datetime(2021, 1, 2), datetime(2021, 1, 3)],
"vals": range(3)
})
my_date_str="2021-01-02"
my_date_fmt="%F"
df.filter(pl.col('dates') >= pl.lit(my_date_str).str.strptime(pl.Date, fmt=my_date_fmt))
shape: (2, 2)
┌─────────────────────┬──────┐
│ dates ┆ vals │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪══════╡
│ 2021-01-02 00:00:00 ┆ 1 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2021-01-03 00:00:00 ┆ 2 │
└─────────────────────┴──────┘
Just be sure to match the format to your date string. For example,
my_date_str="01/02/21"
my_date_fmt="%D"
I can’t speak to the performance of this approach, but it provides an easy way to incorporate string variables into your code.
Hacky workaround for slightly neater code: Just use pandas!
pd.to_datetime
takes a single string, and from testing with my own data as well as your example polars is very happy to work with the pandas datetime object it returns.
If importing from pandas just isn’t possible for you then this is useless, but if you want unfussy string to date conversion … why not use pandas for what it’s good at? 😛
import polars as pl
from datetime import datetime
from pandas import to_datetime # or just import pandas as pd
df = pl.DataFrame({
"dates": [datetime(2021, 1, 1), datetime(2021, 1, 2), datetime(2021, 1, 3)],
"vals": range(3)
})
my_date_str = "2021-01-02"
my_date = to_datetime(my_date_str) # or use pd.to_datetime
print(df.filter(pl.col('dates') >= my_date))
which produces:
shape: (2, 2)
┌─────────────────────┬──────┐
│ dates ┆ vals │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪══════╡
│ 2021-01-02 00:00:00 ┆ 1 │
│ 2021-01-03 00:00:00 ┆ 2 │
└─────────────────────┴──────┘