Python 3.10.6 – Find the element-wise maxima of overlaid xarray datasets

Question:

I’m working with gridded data, specifically netcdf data, trying to find the maximum grid point value for overlaying pixels of each netcdf file in a directory, ignoring null values. If you’re familiar with ArcGIS, this is the same as running the maximum function through Cell Statistics. However, when doing this through xarray, I keep getting errors saying the function I’m calling doesn’t exist or that the data doesn’t have the attribute I assigned it.

I found a module in xarray found here, and used the example in the docs which calls np.fmax, but I couldn’t get it to work. I also tried xr.fmax, and xr.ufuncs.fmax, but they don’t seem to exist.

The goal is to iterate through 10 years of data, saving the maximum of weekly batches, then of the new weekly dataset, find the maximum, average, and count of non-nan values. So if anyone knows of any other modules that can run said statistics, it would be much appreciated

Below is an example of how I’ve been setting things up:

import numpy as np
import xarray as xr
import rioxarray as rio
import os


# Directory that stores all the netcdf files
filepath = 'weekBatch/'


# Putting each file into its own xarray dataset
tot_files = 0
for filename in os.listdir(filepath):
    tot_files += 1

    ds = xr.open_dataset(filepath + filename)

    # MESH is the value I'm trying to do math on
    exec('ds_ge_0_' + str(tot_files) + ' = ds.where(ds.MESH >= 0.0)')
    exec('ds_ge_15_' + str(tot_files) + ' = ds.where(ds.MESH >= 15.0)')

Here’s an example of what a single dataset looks like:

>>> print(ds_ge_0_1.MESH)

<xarray.DataArray 'MESH' (latitude: 3501, longitude: 7001)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])
Coordinates:
  * longitude  (longitude) float64 -130.0 -130.0 -130.0 ... -60.02 -60.01 -59.99
  * latitude   (latitude) float64 55.01 54.99 54.98 54.97 ... 20.02 20.01 20.0

I then try to run the following block…

for i in range(tot_files - 1):

    ds1 = exec('ds_ge_0_' + str(i+1))
    ds2 = exec('ds_ge_0_' + str(i+2))

    ds_ge_0_final = np.fmax(ds1, ds2)

and get the following error:

Traceback (most recent call last):
  File "D:xarray_stuffDataProcess.py", line 287, in <module>
    ds_ge_0_final = np.fmax(ds1, ds2)
TypeError: '>=' not supported between instances of 'NoneType' and 'NoneType'

I used np.fmax, because I found a module in xarray found here, but couldn’t get it to work. I also tried xr.fmax, and xr.ufuncs.fmax, but they don’t seem to exist.

Asked By: PLundquist

||

Answers:

Here is the safe way to handle this data:

import numpy as np
import os


# Directory that stores all the netcdf files
filepath = 'weekBatch/'


# Putting each file into its own xarray dataset
ds_ge_0 = []
ds_ge_15 = []
for filename in os.listdir(filepath):
    ds = xr.open_dataset(filepath + filename)

    ds_ge_0.append( ds.where(ds.MESH >= 0.0) )
    ds_ge_15.append( ds.where(ds.MESH >= 15.0) )

print(ds_ge_0[0].MESH)

# finds the max of all the datasets in ds_ge_0
for i in range(len(ds_ge_0) - 1):
    ds1 = ds_ge_0[i]
    ds2 = ds_ge_0[i+1]
    ds_ge_0_final = np.fmax(ds1, ds2)

# finds the max of all the datasets in ds_ge_15
for i in range(len(ds_ge_15) - 1):
    ds1 = ds_ge_15[i]
    ds2 = ds_ge_15[i+1]
    ds_ge_15_final = np.fmax(ds1, ds2)
Answered By: Tim Roberts
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.