Reading netcdf time with unit of years

Question:

All,
I am trying to read the time coordinate from Berkley Earth in the following temperature file. The time spans from 1850 to 2022. The time unit is in the year A.D. (1850.041667, 1850.125, 1850.208333, …, 2022.708333, 2022.791667,2022.875).

The pandas.to_datetime cannot correctly interpret the time array because I think I need to state the origin of the time coordinate and the unit. I tried
to use pd.to_datetime(dti,unit='D',origin='julian’), but it did not work (out of bounds). Also, I think I have to use a unit of years instead of Days.

The file is located here http://berkeleyearth.lbl.gov/auto/Global/Gridded/Land_and_Ocean_LatLong1.nc

import xarray as xr
import numpy as np
import pandas as pd  
# read data into memory
flname="Land_and_Ocean_LatLon1.nc"
ds = xr.open_dataset("./"+flname)
dti = ds['time']
pd.to_datetime(dti,unit='D',origin='julian')
np.diff(dti)
Asked By: Kernel

||

Answers:

Convert to datetime using %Y as parsing directive to get the year only, then add the fractional year as a timedelta of days. Note that you have might have to account for leap years when calculating the timedelta. Ex:

import pandas as pd

dti = pd.to_datetime(ds['time'], format="%Y")

# it might be sufficient to use e.g. 365 or 365.25 here, depending on the input
daysinyear = pd.Series([366]*dti.size).where(dti.is_leap_year, 365)

dti = dti + pd.to_timedelta(daysinyear * (ds['time']-ds['time'].astype(int)), unit="d")

dti
0      1850-01-16 04:59:59.999971200
1      1850-02-15 15:00:00.000000000
2      1850-03-18 01:00:00.000028800
3      1850-04-17 10:59:59.999971200
4      1850-05-17 21:00:00.000000000
            
2070   2022-07-17 16:59:59.999971200
2071   2022-08-17 03:00:00.000000000
2072   2022-09-16 13:00:00.000028800
2073   2022-10-16 22:59:59.999971200
2074   2022-11-16 09:00:00.000000000
Length: 2075, dtype: datetime64[ns]
Answered By: FObersteiner