how to read date and time on ecmwf file

Question:

I have global datasets in netcdf file. Time information on data file is:

<type 'netCDF4._netCDF4.Variable'>
int32 time(time)
    units: hours since 1900-01-01 00:00:0.0
    long_name: time
    calendar: gregorian
unlimited dimensions: time
current shape = (5875,)
filling off

when I extracted time from file, I got this array:

array([ 876600,  876624,  876648, ..., 1017528, 1017552, 1017576], dtype=int32) 

my question is how to convert this array into proper date format?
[Note: This is a daily data sets and number in an array is corresponding to an hours from 1900-01-01]

Asked By: bikuser

||

Answers:

You could:

from datetime import date, timedelta

hours = [ 876600,  876624,  876648, 1017528, 1017552, 1017576]
base = date(1900, 1, 1)
for hour in hours:
    base + timedelta(hours=hour)

2000-01-02
2000-01-03
2000-01-04
2016-01-30
2016-01-31
2016-02-01

Use datetime instead of date if you want hour etc info.

Or using a pd.DataFrame:

df = pd.DataFrame(hours, columns=['hours'])
df['date'] = df.hours.apply(lambda x: base + timedelta(hours=x))

     hours        date
0   876600  2000-01-02
1   876624  2000-01-03
2   876648  2000-01-04
3  1017528  2016-01-30
4  1017552  2016-01-31
5  1017576  2016-02-01
Answered By: Stefan

The soln using .apply is horribly inefficient, not to mention non-idiomatic and ugly. pandas already has built-in vectorized methods of doing timedelta conversions.

In [17]: hours = [ 876600,  876624,  876648, 1017528, 1017552, 1017576]*10000

In [18]: df = pd.DataFrame(hours, columns=['hours'])

In [19]: %timeit df.hours.apply(lambda x: base + timedelta(hours=x))
10 loops, best of 3: 74.2 ms per loop

In [21]: %timeit pd.to_timedelta(df.hours, unit='h') + Timestamp(base)
100 loops, best of 3: 11.3 ms per loop

In [23]: (pd.to_timedelta(df.hours, unit='h') + Timestamp(base)).head()
Out[23]: 
0   2000-01-02
1   2000-01-03
2   2000-01-04
3   2016-01-30
4   2016-01-31
Name: hours, dtype: datetime64[ns]
Answered By: Jeff

The ideal way to do this is using netCDF4 num2date

import netCDF4

ncfile = netCDF4.Dataset('./foo.nc', 'r')
time = ncfile.variables['time']
dates = netCDF4.num2date(time[:], time.units, time.calendar)
Answered By: N1B4
import xarray as xr
import pandas as pd

ar = xr.open_dataset('xyz.nc') #read the data downloaded from ECMEF with xarray 
conv = ar.to_dataframe() % convert it to data frame 
driex = conv.reset_index()
df = driex.set_index('time').resample('D').mean()  # hourly to daily average 

No=df.no #read attributes (eg: Nitrogen monoxide)
so2=df.so2
Answered By: Sankar jyoti nath