How to add a duration to datetime in Python polars
Question:
I want to add a duration in seconds to a date/time. My data looks like
import polars as pl
df = pl.DataFrame(
{
"dt": [
"2022-12-14T00:00:00", "2022-12-14T00:00:00", "2022-12-14T00:00:00",
],
"seconds": [
1.0, 2.2, 2.4,
],
}
)
df = df.with_column(pl.col("dt").str.strptime(pl.Datetime).cast(pl.Datetime))
Now my naive attempt was to to convert the float column to duration type to be able to add it to the datetime column (as I would do in pandas
).
df = df.with_column(pl.col("seconds").cast(pl.Duration).alias("duration0"))
print(df.head())
┌─────────────────────┬─────────┬──────────────┐
│ dt ┆ seconds ┆ duration0 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ duration[μs] │
╞═════════════════════╪═════════╪══════════════╡
│ 2022-12-14 00:00:00 ┆ 1.0 ┆ 0µs │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.2 ┆ 0µs │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.4 ┆ 0µs │
└─────────────────────┴─────────┴──────────────┘
…gives the correct data type, however the values are all zero.
I also tried
df = df.with_column(
pl.col("seconds")
.apply(lambda x: pl.duration(nanoseconds=x * 1e9))
.alias("duration1")
)
print(df.head())
shape: (3, 4)
┌─────────────────────┬─────────┬──────────────┬─────────────────────────────────────┐
│ dt ┆ seconds ┆ duration0 ┆ duration1 │
│ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ duration[μs] ┆ object │
╞═════════════════════╪═════════╪══════════════╪═════════════════════════════════════╡
│ 2022-12-14 00:00:00 ┆ 1.0 ┆ 0µs ┆ 0i64.duration([0i64, 1000000000f... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.2 ┆ 0µs ┆ 0i64.duration([0i64, 2200000000f... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.4 ┆ 0µs ┆ 0i64.duration([0i64, 2400000000f... │
└─────────────────────┴─────────┴──────────────┴─────────────────────────────────────┘
which gives an object type column which isn’t helpful either. The documentation is kind of sparse on the topic, any better options?
Answers:
Update: The values being zero is a repr formatting issue that has been fixed with this commit.
pl.duration()
can be used in this way:
>>> df.with_column(
... pl.col("dt").str.strptime(pl.Datetime)
... + pl.duration(nanoseconds=pl.col("seconds") * 1e9)
... )
shape: (3, 2)
┌─────────────────────────┬─────────┐
│ dt | seconds │
│ --- | --- │
│ datetime[μs] | f64 │
╞═════════════════════════╪═════════╡
│ 2022-12-14 00:00:01 | 1.0 │
├─────────────────────────┼─────────┤
│ 2022-12-14 00:00:02.200 | 2.2 │
├─────────────────────────┼─────────┤
│ 2022-12-14 00:00:02.400 | 2.4 │
└─//──────────────────────┴─//──────┘
there’s another option as well; since datetime is represented internally as microseconds here, you can directly add the seconds as microseconds:
MICROSECONDS_PER_SECOND = 1e6
df = df.with_column((df["dt"]+df["seconds"]*MICROSECONDS_PER_SECOND)
.cast(pl.Datetime)
.alias("dt_new"))
print(df.head())
shape: (3, 3)
┌─────────────────────┬─────────┬─────────────────────────┐
│ dt ┆ seconds ┆ dt_new │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ datetime[μs] │
╞═════════════════════╪═════════╪═════════════════════════╡
│ 2022-12-14 00:00:00 ┆ 1.0 ┆ 2022-12-14 00:00:01 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.2 ┆ 2022-12-14 00:00:02.200 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.4 ┆ 2022-12-14 00:00:02.400 │
└─────────────────────┴─────────┴─────────────────────────┘
I want to add a duration in seconds to a date/time. My data looks like
import polars as pl
df = pl.DataFrame(
{
"dt": [
"2022-12-14T00:00:00", "2022-12-14T00:00:00", "2022-12-14T00:00:00",
],
"seconds": [
1.0, 2.2, 2.4,
],
}
)
df = df.with_column(pl.col("dt").str.strptime(pl.Datetime).cast(pl.Datetime))
Now my naive attempt was to to convert the float column to duration type to be able to add it to the datetime column (as I would do in pandas
).
df = df.with_column(pl.col("seconds").cast(pl.Duration).alias("duration0"))
print(df.head())
┌─────────────────────┬─────────┬──────────────┐
│ dt ┆ seconds ┆ duration0 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ duration[μs] │
╞═════════════════════╪═════════╪══════════════╡
│ 2022-12-14 00:00:00 ┆ 1.0 ┆ 0µs │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.2 ┆ 0µs │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.4 ┆ 0µs │
└─────────────────────┴─────────┴──────────────┘
…gives the correct data type, however the values are all zero.
I also tried
df = df.with_column(
pl.col("seconds")
.apply(lambda x: pl.duration(nanoseconds=x * 1e9))
.alias("duration1")
)
print(df.head())
shape: (3, 4)
┌─────────────────────┬─────────┬──────────────┬─────────────────────────────────────┐
│ dt ┆ seconds ┆ duration0 ┆ duration1 │
│ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ duration[μs] ┆ object │
╞═════════════════════╪═════════╪══════════════╪═════════════════════════════════════╡
│ 2022-12-14 00:00:00 ┆ 1.0 ┆ 0µs ┆ 0i64.duration([0i64, 1000000000f... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.2 ┆ 0µs ┆ 0i64.duration([0i64, 2200000000f... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.4 ┆ 0µs ┆ 0i64.duration([0i64, 2400000000f... │
└─────────────────────┴─────────┴──────────────┴─────────────────────────────────────┘
which gives an object type column which isn’t helpful either. The documentation is kind of sparse on the topic, any better options?
Update: The values being zero is a repr formatting issue that has been fixed with this commit.
pl.duration()
can be used in this way:
>>> df.with_column(
... pl.col("dt").str.strptime(pl.Datetime)
... + pl.duration(nanoseconds=pl.col("seconds") * 1e9)
... )
shape: (3, 2)
┌─────────────────────────┬─────────┐
│ dt | seconds │
│ --- | --- │
│ datetime[μs] | f64 │
╞═════════════════════════╪═════════╡
│ 2022-12-14 00:00:01 | 1.0 │
├─────────────────────────┼─────────┤
│ 2022-12-14 00:00:02.200 | 2.2 │
├─────────────────────────┼─────────┤
│ 2022-12-14 00:00:02.400 | 2.4 │
└─//──────────────────────┴─//──────┘
there’s another option as well; since datetime is represented internally as microseconds here, you can directly add the seconds as microseconds:
MICROSECONDS_PER_SECOND = 1e6
df = df.with_column((df["dt"]+df["seconds"]*MICROSECONDS_PER_SECOND)
.cast(pl.Datetime)
.alias("dt_new"))
print(df.head())
shape: (3, 3)
┌─────────────────────┬─────────┬─────────────────────────┐
│ dt ┆ seconds ┆ dt_new │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ datetime[μs] │
╞═════════════════════╪═════════╪═════════════════════════╡
│ 2022-12-14 00:00:00 ┆ 1.0 ┆ 2022-12-14 00:00:01 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.2 ┆ 2022-12-14 00:00:02.200 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.4 ┆ 2022-12-14 00:00:02.400 │
└─────────────────────┴─────────┴─────────────────────────┘