Subsetting netcdf files with multiple variables by time range
Question:
I have hundreds of netcdf files, each with 18 variables. The data in each file dates from 1850 to 2100 and roughly 20GB each. CDO and NCO could not read the data due to the rotation because the extent (-1.40625, 358.5938, 89.25846, 89.25846) is unusual (see below).
Rotating the files is not an option due to their sizes. I’m looking for a way to subset the data by period (e.g., 2020 – 2060) so that they will be manageable for further processing.
I can subset a netcdf file with a single variable but not sure how to do it for multiple variables and considering the unusual extent. Any tips would be appreciated.
>mynetcdf
class : SpatRaster
dimensions : 64, 128, 602262 (nrow, ncol, nlyr)
resolution : 2.8125, 2.789327 (x, y)
extent : -1.40625, 358.5938, -89.25846, 89.25846 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
sources : fwi_day_CanESM5_historical_r1i1p1f1_gn_full-outputs.nc:fwi (60225 layers)
fwi_day_CanESM5_historical_r1i1p1f1_gn_full-outputs.nc:ffmc (60225 layers)
fwi_day_CanESM5_historical_r1i1p1f1_gn_full-outputs.nc:dmc (60225 layers)
... and 15 more source(s)
varnames : fwi
ffmc
dmc
...
names : fwi_1, fwi_2, fwi_3, fwi_4, fwi_5, fwi_6, ...
unit : 1, 1, 1, 1, 1, 1, ... ``````
Answers:
Found a way to do this with xarray.
import xarray as xr
all_nc = glob.glob("..filepath/myfile*.nc") #read all the files
nc_list = [] #create a list
for i in all_nc:
infile = xr.open_dataset(i, drop_variables=['TEMP_wDC_2014','ffmcPREV_2014', 'dcPREV_2014', 'dmcPREV_2014','SeasonActive_2014', 'DCf_2014', 'rw_2014', 'CounterSeasonActive_2014','ffmc', 'dc','dmc', 'isi', 'bui', 'TEMP', 'RH', 'RAIN', 'WIND']) #drop unwanted variables
yt = infile.sel(lon=slice(218.9931, 236.2107), lat=slice(60, 69.64794)) # 0 to 360 bounding box the areas of interest
yt1 = yt.sel(time=slice('2020-01-01', '2060-12-31')) # slice by year
yt2 = yt1.sel(time=yt1.time.dt.month.isin([3, 4, 5, 6, 7, 8, 9])) # slice by month
nc_list.extend(yt2) ```# write outputs into a list
I have hundreds of netcdf files, each with 18 variables. The data in each file dates from 1850 to 2100 and roughly 20GB each. CDO and NCO could not read the data due to the rotation because the extent (-1.40625, 358.5938, 89.25846, 89.25846) is unusual (see below).
Rotating the files is not an option due to their sizes. I’m looking for a way to subset the data by period (e.g., 2020 – 2060) so that they will be manageable for further processing.
I can subset a netcdf file with a single variable but not sure how to do it for multiple variables and considering the unusual extent. Any tips would be appreciated.
>mynetcdf
class : SpatRaster
dimensions : 64, 128, 602262 (nrow, ncol, nlyr)
resolution : 2.8125, 2.789327 (x, y)
extent : -1.40625, 358.5938, -89.25846, 89.25846 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
sources : fwi_day_CanESM5_historical_r1i1p1f1_gn_full-outputs.nc:fwi (60225 layers)
fwi_day_CanESM5_historical_r1i1p1f1_gn_full-outputs.nc:ffmc (60225 layers)
fwi_day_CanESM5_historical_r1i1p1f1_gn_full-outputs.nc:dmc (60225 layers)
... and 15 more source(s)
varnames : fwi
ffmc
dmc
...
names : fwi_1, fwi_2, fwi_3, fwi_4, fwi_5, fwi_6, ...
unit : 1, 1, 1, 1, 1, 1, ... ``````
Found a way to do this with xarray.
import xarray as xr
all_nc = glob.glob("..filepath/myfile*.nc") #read all the files
nc_list = [] #create a list
for i in all_nc:
infile = xr.open_dataset(i, drop_variables=['TEMP_wDC_2014','ffmcPREV_2014', 'dcPREV_2014', 'dmcPREV_2014','SeasonActive_2014', 'DCf_2014', 'rw_2014', 'CounterSeasonActive_2014','ffmc', 'dc','dmc', 'isi', 'bui', 'TEMP', 'RH', 'RAIN', 'WIND']) #drop unwanted variables
yt = infile.sel(lon=slice(218.9931, 236.2107), lat=slice(60, 69.64794)) # 0 to 360 bounding box the areas of interest
yt1 = yt.sel(time=slice('2020-01-01', '2060-12-31')) # slice by year
yt2 = yt1.sel(time=yt1.time.dt.month.isin([3, 4, 5, 6, 7, 8, 9])) # slice by month
nc_list.extend(yt2) ```# write outputs into a list