Trouble with strptime() conversion of duration time string
Question:
I have some duration type data (lap times) as pl.Utf8
that fails to convert using strptime
, whereas regular datetimes work as expected.
Minutes (before ð and Seconds (before .) are always padded to two digits, Milliseconds are always 3 digits.
Lap times are always < 2 min.
df = pl.DataFrame({
"lap_time": ["01:14.007", "00:53.040", "01:00.123"]
})
df = df.with_columns(
[
# pl.col('release_date').str.strptime(pl.Date, fmt="%B %d, %Y"), # works
pl.col('lap_time').str.strptime(pl.Time, fmt="%M:%S.%3f").cast(pl.Duration), # fails
]
)
So I used the chrono
format specifier definitions from https://docs.rs/chrono/latest/chrono/format/strftime/index.html which are used as per the polars
docs of strptime
the second conversion (for lap_time
) always fails, no matter whether I use .%f
, .%3f
, %.3f
. Apparently, strptime
doesn’t allow creating a pl.Duration
directly, so I tried with pl.Time
but it fails with error:
ComputeError: strict conversion to dates failed, maybe set strict=False
but setting strict=False
yields all null
values for the whole Series.
Am I missing something or this some weird behavior on chrono
‘s or python-polars
part?
Answers:
General case
In case you have duration that may exceed 24 hours, you can extract data (minutes, seconds and so on) from string using regex pattern. For example:
df = pl.DataFrame({
"time": ["+01:14.007", "100:20.000", "-05:00.000"]
})
df.with_columns(
pl.col("time").str.extract_all(r"([+-]?d+)")
# /
# you will get array of length 3
# ["min", "sec", "ms"]
).with_columns(
pl.duration(
minutes=pl.col("time").arr.get(0),
seconds=pl.col("time").arr.get(1),
milliseconds=pl.col("time").arr.get(2)
).alias("time")
)
ââââââââââââââââ
â time â
â --- â
â duration[ns] â
ââââââââââââââââĄ
â 1m 14s 7ms â
â 1h 40m 20s â
â -5m â
ââââââââââââââââ
About pl.Time
To convert data to pl.Time
, you need to specify hours as well. When you add 00
hours to your time, code will work:
df = pl.DataFrame({"str_time": ["01:14.007", "01:18.880"]})
df.with_columns(
duration = (pl.lit("00:") + pl.col("str_time"))
.str.strptime(pl.Time, fmt="%T%.3f")
.cast(pl.Duration)
)
âââââââââââââŽâââââââââââââââ
â str_time â duration â
â --- â --- â
â str â duration[Ξs] â
âââââââââââââŠâââââââââââââââĄ
â 01:14.007 â 1m 14s 7ms â
â 01:18.880 â 1m 18s 880ms â
âââââââââââââīâââââââââââââââ
Create your own parser – strptime
works for DateTime stamps only, not for time deltas. The accepted answer is bad practice as it fails for reasonable inputs like durations of 80 minutes, or negative durations.
You can use pl.Series.str.extract()
to make your own regex parser and extract the values you want before passing them into the Duration
constructor.
As far as I’m aware there is no "duration stamp" parser in Rust. Maybe good idea for a crate if anyone is reading this. Syntax could be similar to strptime
but handle cases like: negative duration, non-wrapping for the most significant "digit"/subunit, in this case where it’s a "minute duration stamp" you would wrap seconds at 60 but not minutes. Especially making sure that 61 remains 61.
Code adapted from glebcom’s answer:
df = df.with_columns(
[
# pl.col('release_date').str.strptime(pl.Date, fmt="%B %d, %Y"), # works
pl.duration(
minutes=pl.col("lap_time").str.slice(0,2),
seconds=pl.col("lap_time").str.slice(3,2),
milliseconds=pl.col("lap_time").str.slice(6,3)
).alias('lap_time'),
]
)
This answer was posted as an edit to the question Trouble with strptime() conversion of duration time string by the OP Dorian under CC BY-SA 4.0.
I have some duration type data (lap times) as pl.Utf8
that fails to convert using strptime
, whereas regular datetimes work as expected.
Minutes (before ð and Seconds (before .) are always padded to two digits, Milliseconds are always 3 digits.
Lap times are always < 2 min.
df = pl.DataFrame({
"lap_time": ["01:14.007", "00:53.040", "01:00.123"]
})
df = df.with_columns(
[
# pl.col('release_date').str.strptime(pl.Date, fmt="%B %d, %Y"), # works
pl.col('lap_time').str.strptime(pl.Time, fmt="%M:%S.%3f").cast(pl.Duration), # fails
]
)
So I used the chrono
format specifier definitions from https://docs.rs/chrono/latest/chrono/format/strftime/index.html which are used as per the polars
docs of strptime
the second conversion (for lap_time
) always fails, no matter whether I use .%f
, .%3f
, %.3f
. Apparently, strptime
doesn’t allow creating a pl.Duration
directly, so I tried with pl.Time
but it fails with error:
ComputeError: strict conversion to dates failed, maybe set strict=False
but setting strict=False
yields all null
values for the whole Series.
Am I missing something or this some weird behavior on chrono
‘s or python-polars
part?
General case
In case you have duration that may exceed 24 hours, you can extract data (minutes, seconds and so on) from string using regex pattern. For example:
df = pl.DataFrame({
"time": ["+01:14.007", "100:20.000", "-05:00.000"]
})
df.with_columns(
pl.col("time").str.extract_all(r"([+-]?d+)")
# /
# you will get array of length 3
# ["min", "sec", "ms"]
).with_columns(
pl.duration(
minutes=pl.col("time").arr.get(0),
seconds=pl.col("time").arr.get(1),
milliseconds=pl.col("time").arr.get(2)
).alias("time")
)
ââââââââââââââââ
â time â
â --- â
â duration[ns] â
ââââââââââââââââĄ
â 1m 14s 7ms â
â 1h 40m 20s â
â -5m â
ââââââââââââââââ
About pl.Time
To convert data to pl.Time
, you need to specify hours as well. When you add 00
hours to your time, code will work:
df = pl.DataFrame({"str_time": ["01:14.007", "01:18.880"]})
df.with_columns(
duration = (pl.lit("00:") + pl.col("str_time"))
.str.strptime(pl.Time, fmt="%T%.3f")
.cast(pl.Duration)
)
âââââââââââââŽâââââââââââââââ
â str_time â duration â
â --- â --- â
â str â duration[Ξs] â
âââââââââââââŠâââââââââââââââĄ
â 01:14.007 â 1m 14s 7ms â
â 01:18.880 â 1m 18s 880ms â
âââââââââââââīâââââââââââââââ
Create your own parser – strptime
works for DateTime stamps only, not for time deltas. The accepted answer is bad practice as it fails for reasonable inputs like durations of 80 minutes, or negative durations.
You can use pl.Series.str.extract()
to make your own regex parser and extract the values you want before passing them into the Duration
constructor.
As far as I’m aware there is no "duration stamp" parser in Rust. Maybe good idea for a crate if anyone is reading this. Syntax could be similar to strptime
but handle cases like: negative duration, non-wrapping for the most significant "digit"/subunit, in this case where it’s a "minute duration stamp" you would wrap seconds at 60 but not minutes. Especially making sure that 61 remains 61.
Code adapted from glebcom’s answer:
df = df.with_columns(
[
# pl.col('release_date').str.strptime(pl.Date, fmt="%B %d, %Y"), # works
pl.duration(
minutes=pl.col("lap_time").str.slice(0,2),
seconds=pl.col("lap_time").str.slice(3,2),
milliseconds=pl.col("lap_time").str.slice(6,3)
).alias('lap_time'),
]
)
This answer was posted as an edit to the question Trouble with strptime() conversion of duration time string by the OP Dorian under CC BY-SA 4.0.