How to fix OverflowError: Overflow in int64 addition

Question:

I’m trying to subtract column df['date_of_admission'] from the column df['DOB'] to find the difference between then and store the age value in df['age'] column, however, I’m getting this error:

OverflowError: Overflow in int64 addition

 DOB          date_of_admission      age
 2000-05-07   2019-01-19 12:26:00        
 1965-01-30   2019-03-21 02:23:12        
 NaT          2018-11-02 18:30:10        
 1981-05-01   2019-05-08 12:26:00       
 1957-01-10   2018-12-31 04:01:15         
 1968-07-14   2019-01-28 15:05:09            
 NaT          2018-04-13 06:20:01 
 NaT          2019-02-15 01:01:57 
 2001-02-10   2019-03-21 08:22:00       
 1990-03-29   2018-11-29 03:05:03
.....         ......
.....         .....
.....         .....

I’ve tried it with the following:

import numpy as np
import pandas as pd
from datetime import dt

df['age'] = (df['date_of_admission'] - df['DOB']).dt.days // 365

Expected to get the following age column after finding the difference between:

age
26
69
NaN
58
.
.
.
Asked By: bilal

||

Answers:

1). You are doing it correctly but the DOB contains the only date AND date_of_admission contains both date and time. Manipulate the date_of_admission so that it will only contain the date, then you will get your result.

2). Here I am adding a change function into your code so that you will get your result.

import numpy as np
import pandas as pd
from datetime import dt

def change(x):
    return x.date()

df['date_of_admission'] = df['date_of_admission'].apply(change)

df['age'] = df['date_of_admission'].subtract(df['DOB']).dt.days // 365

I hope it will help you.

Answered By: Rahul charan

Convert both columns into date then subtract it

import pandas as pd


df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date

df['DOB'] = pd.to_datetime(df['DOB']).dt.date

df['age'] = ((df['date_of_admission']-df['DOB']).dt.days) //365

SECOND TEST

#Now I have use DOB AND date_of_admission data from the question and it is working fine

df = pd.DataFrame(data={"DOB":['2000-05-07','1965-01-30','NaT'],
                   "date_of_admission":["2019-01-19 12:26:00","2019-03-21 02:23:12", "2018-11-02 18:30:10"]})

df['DOB'] = pd.to_datetime(df['DOB']).dt.date
df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date
df['age'] = ((df['date_of_admission']-df['DOB']).dt.days) //365

RESULT:

DOB       date_of_admission   age
2000-05-07  2019-01-19       18.0
1965-01-30  2019-03-21       54.0
NaT         2018-11-02       NaN
Answered By: tawab_shakeel

OP is most likely using the medical MIMIC dataset, where dates have been scrambled to protect the identities of patients. Specifically, for patients over age 89, they shifted the date of birth by 300 years.

Long timespans such as this give an overflow when using pandas timedelta:

pd.to_timedelta(300, unit="Y", box=False)
> numpy.timedelta64(-8979658473709551616,'ns')

When this occurs in a dataframe operation, you’ll experience an error instead. As adapted from the answer of @tawab_shakeel :

df = pd.DataFrame(data={"DOB":['2000-05-07','1965-01-30','1700-01-01'],
                   "date_of_admission":["2019-01-19 12:26:00","2019-03-21 02:23:12", "2000-01-01 02:23:23"]})

df['DOB'] = pd.to_datetime(df['DOB']).dt.date
df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date

# Gives AttributeError: Can only use .dt accessor with datetimelike values
df['age'] = ((df['date_of_admission']-df['DOB']).dt.days) //365

# Gives OverflowError: long too big to convert
pd.to_timedelta(df['date_of_admission']-df['DOB'])

This problem occurs with any calculation that converts to a timedelta64[ns] datatype.

As a workaround, you can use an apply operation instead, directly calculating the age element per element:

df['age'] = df.apply(lambda e: (e['date_of_admission'] - e['DOB']).days/365, axis=1)
Answered By: DieterDP

I’m using the MIMIC III dataset and encountered this problem as well. I found that the comment by @Rahul_chacharan to use df['date_of_admission'].subtract(df['DOB']).dt.days // 365
worked for me!

Answered By: laxolotl

I think we all meet this problem dealing MIMIC3 database, here is my solution, very intuitive, but much faster, you can have a try.

df_labevents_temp['age']  = ((df['date_of_admission'].values  - df['DOB'].values).astype(np.int)/8.64e13//365).astype(np.int)

The idea is to turn it to numpy int, then convert from ns to years. The pandas date diff function doesn’t work for me, using apply is too slow (I think just loop).

Answered By: Hao Wu

For those getting errors AttributeError: Can only use .dt accessor with datetimelike values from @tawab_shakeel’s answer, try following code

import pandas as pd

df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date

df['DOB'] = pd.to_datetime(df['DOB']).dt.date

df['age'] = ((df['date_of_admission']-df['DOB']) //365).dt.days

It will return int value for age

Answered By: little_thumb