How to convert float64 time to datetime object

Question:

I am working on Berkeley Earth Surface Temperature data. I have a monthly NetCDF file from 1753-recent. The date axis is float64 and when I convert to DateTime format it only returns the first day and month of each year. Below is the time documentation from Berkeley Earth Surface Temperature data:

time: A list of times at which data is reported. The data format is decimal
with year and fraction of year reported, with each value corresponding
to the midpoint of the respective month. For example, 1981.125
indicates February 1981.

I tried to convert DateTime from float to int and then apply pd.to_datetime().
It returns a value error when I use month in the format.

pd.to_datetime(dset.time.astype(int), format="%m%Y")
ValueError: time data '1850' does not match format '%m%Y' (match)


pd.to_datetime(dset.time.astype(int), format=%Y")
DatetimeIndex(['1850-01-01', '1850-01-01', '1850-01-01', '1850-01-01',
           '1850-01-01', '1850-01-01', '1850-01-01', '1850-01-01',
           '1850-01-01', '1850-01-01',
           ...
           '2021-01-01', '2021-01-01', '2021-01-01', '2022-01-01',
           '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
           '2022-01-01', '2022-01-01'],
          dtype='datetime64[ns]', length=2071, freq=None)

I am new to xarray and NetCDF files, any help would be appreciated. Here is the link to the website – http://berkeleyearth.org/data/

Here is a description of my data:

 <xarray.Dataset>
 Dimensions:  (longitude: 360, latitude: 180, time: 2071, month_number: 12)
 Coordinates:
     * longitude    (longitude) float32 -179.5 -178.5 -177.5 ... 177.5 178.5 179.5
     * latitude     (latitude) float32 -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
     * time         (time) float64 1.85e+03 1.85e+03 ... 2.022e+03 2.023e+03
 Dimensions without coordinates: month_number
 Data variables:
     land_mask    (latitude, longitude) float64 ...
     temperature  (time, latitude, longitude) float32 ...
     climatology  (month_number, latitude, longitude) float32 ...
 Attributes:
 Conventions:           Berkeley Earth Internal Convention (based on CF-1.5)
     title:                 Native Format Berkeley Earth Surface 
     Temperature A...
     history:               27-Aug-2022 08:16:14
     institution:           Berkeley Earth Surface Temperature Project
     land_source_history:   05-Aug-2022 11:14:59
     ocean_source_history:  27-Aug-2022 05:20:43
     comment:               This file contains Berkeley Earth surface temperature...

I am guessing this is what you meant by top rows:

<xarray.DataArray 'time' (time: 2071)>
array([1850.041667, 1850.125   , 1850.208333, ..., 2022.375   , 2022.458333,
   2022.541667])
Coordinates:
    * time     (time) float64 1.85e+03 1.85e+03 1.85e+03 ... 2.022e+03 2.023e+03
Attributes:
units:          year A.D.
standard_name:  time
long_name:      Time
Asked By: Aditya Aryan

||

Answers:

You need to convert the floating point date to a format that can be converted by pd.to_datetime. Based on the description you have provided, you can extract (assuming fdate represents the floating point date value):

year = int(fdate)
month = int((fdate - year) * 12) + 1

and then convert that to a string in the form mmyyyy using an f-string:

f'{month:02d}{year:04d}'

That can then be converted to a datetime using format %m%Y. Wrapping it in a function:

def convert_date(fdate):
    year = int(fdate)
    month = int((fdate - year) * 12) + 1
    return f'{month:02d}{year:04d}'

you would then use it as:

pd.to_datetime(xr.apply_ufunc(convert_date, dset.time, vectorize=True), format='%m%Y')
Answered By: Nick