Out of bound timestamps in pandas

Question:

I need to rewrite some sql code to python, and my problem is necessity of calculation differences in days:
enter image description here
As you can see, for cases with final_pmt_date ‘9999-12-31’, the dates subtracted easily.

But in pandas there is limit for datetime64 type, so I get exception:
enter image description here

All answers I saw were about converting this dates into NaN (with ‘coerce’ keyword). But I need to calculate number of days for such datetimes also.

Thank you in advance

Answers:

A date like 9999-12-31 is out of range for pandas datetime.

Using vanilla Python datetime might be an alternative here, e.g. like

from datetime import datetime
import pandas as pd

df = pd.DataFrame(
    {
        "open": ["2021-12-27 00:00:00.000", "2019-03-06 00:00:00.000"],
        "close": ["9999-12-31 00:00:00.000", "2022-04-06 00:00:00.000"],
    }
)

df["delta"] = df.apply(
    (
        lambda row: datetime.fromisoformat(row["close"])
        - datetime.fromisoformat(row["open"]),
    ),
    axis=1,
)

df
                      open                    close                  delta
0  2021-12-27 00:00:00.000  9999-12-31 00:00:00.000  2913908 days, 0:00:00
1  2019-03-06 00:00:00.000  2022-04-06 00:00:00.000     1127 days 00:00:00

However note that you’ll have to use an apply which is not very efficient compared to the "vectorized" pandas datetime methods. Maybe using NaT as an "invalid-value-identfier" is an option after all?

Answered By: FObersteiner
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.