Subsetting netcdf files with multiple variables by time range

Question:

I have hundreds of netcdf files, each with 18 variables. The data in each file dates from 1850 to 2100 and roughly 20GB each. CDO and NCO could not read the data due to the rotation because the extent (-1.40625, 358.5938, 89.25846, 89.25846) is unusual (see below).

Rotating the files is not an option due to their sizes. I’m looking for a way to subset the data by period (e.g., 2020 – 2060) so that they will be manageable for further processing.
I can subset a netcdf file with a single variable but not sure how to do it for multiple variables and considering the unusual extent. Any tips would be appreciated.

>mynetcdf

class       : SpatRaster 
dimensions  : 64, 128, 602262  (nrow, ncol, nlyr)
resolution  : 2.8125, 2.789327  (x, y)
extent      : -1.40625, 358.5938, -89.25846, 89.25846  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
sources     : fwi_day_CanESM5_historical_r1i1p1f1_gn_full-outputs.nc:fwi  (60225 layers) 
              fwi_day_CanESM5_historical_r1i1p1f1_gn_full-outputs.nc:ffmc  (60225 layers) 
              fwi_day_CanESM5_historical_r1i1p1f1_gn_full-outputs.nc:dmc  (60225 layers) 
              ... and 15 more source(s)
varnames    : fwi 
              ffmc 
              dmc 
              ...
names       : fwi_1, fwi_2, fwi_3, fwi_4, fwi_5, fwi_6, ... 
unit        :     1,     1,     1,     1,     1,     1, ... ``````
Asked By: user11384727

||

Answers:

Found a way to do this with xarray.

import xarray as xr

all_nc = glob.glob("..filepath/myfile*.nc") #read all the files


nc_list = [] #create a list
for i in all_nc: 
    infile = xr.open_dataset(i, drop_variables=['TEMP_wDC_2014','ffmcPREV_2014', 'dcPREV_2014', 'dmcPREV_2014','SeasonActive_2014', 'DCf_2014', 'rw_2014', 'CounterSeasonActive_2014','ffmc', 'dc','dmc', 'isi', 'bui', 'TEMP', 'RH', 'RAIN', 'WIND']) #drop unwanted variables
    yt = infile.sel(lon=slice(218.9931, 236.2107), lat=slice(60, 69.64794)) # 0 to 360 bounding box the areas of interest
    yt1 = yt.sel(time=slice('2020-01-01', '2060-12-31')) # slice by year
    yt2 = yt1.sel(time=yt1.time.dt.month.isin([3, 4, 5, 6, 7, 8, 9])) # slice by month
    nc_list.extend(yt2) ```# write outputs into a list
Answered By: user11384727
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.