How to calculate differences between two pandas.Timestamp Series in nanoseconds

Question:

I have two Series which are pd.Timestamps, and they are extremely close. I’d like to get the elementwise difference between the two Series, but with nanosecond precision.

First Series:

0    2021-05-21 00:02:11.349001429
1    2021-05-21 00:02:38.195857153
2    2021-05-21 00:03:25.527530228
3    2021-05-21 00:03:26.653410069
4    2021-05-21 00:03:26.798157366

Second Series:

0    2021-05-21 00:02:11.348997322
1    2021-05-21 00:02:38.195852267
2    2021-05-21 00:03:25.527526087
3    2021-05-21 00:03:26.653406759
4    2021-05-21 00:03:26.798154350

Now if I simply use the - operator, I will truncate the nanoseconds difference. It will show something like this:

Series1 - Series2
0    00:00:00.000004
1    00:00:00.000004
2    00:00:00.000004
3    00:00:00.000003
4    00:00:00.000003

I don’t want to lose the nanosecond precision when calculating the differences between Timestamps. I have hacked up a solution that involves doing a for loop over each row, and calculating the scalar difference in pd.Timedelta, then getting the microseconds and nanoseconds out of that. Like this (for the first element):

single_diff = Series1[0] - Series2[0]
single_diff.microseconds * 1000 + single_diff.nanoseconds
4107

Is there a neater vectorized way to do this, instead of a for loop?

Asked By: Anton

||

Answers:

You won’t lose precision if you work with timedelta as shown. The internal representation is always nanoseconds. After calculating the timedelta, you can convert to integer to obtain the difference in nanoseconds. Ex:

import pandas as pd
import numpy as np

s1 = pd.Series(pd.to_datetime(["2021-05-21 00:02:11.349001429",
                     "2021-05-21 00:02:38.195857153",
                     "2021-05-21 00:03:25.527530228",
                     "2021-05-21 00:03:26.653410069",
                     "2021-05-21 00:03:26.798157366"]))

s2 = pd.Series(pd.to_datetime(["2021-05-21 00:02:11.348997322",
                     "2021-05-21 00:02:38.195852267",
                     "2021-05-21 00:03:25.527526087",
                     "2021-05-21 00:03:26.653406759",
                     "2021-05-21 00:03:26.798154350"]))

delta = (s1-s2).astype(np.int64)

delta
0    4107
1    4886
2    4141
3    3310
4    3016
dtype: int64

Note: I’m using numpy’s int64 type here since on some systems, the built-in int will result in 32-bit integers, i.e. the conversion fails.

Answered By: FObersteiner

You can also get the nanosecond without numpy, like this

import pandas as pd

s1 = pd.Series(
    pd.to_datetime(
        [
            "2021-05-21 00:02:11.349001429",
            "2021-05-21 00:02:38.195857153",
            "2021-05-21 00:03:25.527530228",
            "2021-05-21 00:03:26.653410069",
            "2021-05-21 00:03:26.798157366",
        ]
    )
)

s2 = pd.Series(
    pd.to_datetime(
        [
            "2021-05-21 00:02:11.348997322",
            "2021-05-21 00:02:38.195852267",
            "2021-05-21 00:03:25.527526087",
            "2021-05-21 00:03:26.653406759",
            "2021-05-21 00:03:26.798154350",
        ]
    )
)

# before pandas 1.5.0
(s1 - s2 ).apply(lambda x: x.delta)
# 0    4107
# 1    4886
# 2    4141
# 3    3310
# 4    3016
# dtype: int64

# since pandas 1.5.0
(S1 - S2).apply(lambda x: x.value)
# 0    4107
# 1    4886
# 2    4141
# 3    3310
# 4    3016
# dtype: int64
Answered By: Alpha
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.