Basic addition/subtraction on xarray columns

Question:

I’m trying to perform really basic vector operations on data imported through xarray, but having a really difficult time figuring out how to do it. I’ve got x, y, and z coordinates of two objects, and I’m needing to perform vector addition and subtraction on these as well as compute the cross product. Here’s my attempt:

ds = xr.open_dataset(data_path)
leo = ds[['xLeo','yLeo','zLeo']].to_array()
gps = ds[['xGps','yGps','zGps']].to_array()
example = gps - leo

This produces:

xarray.DataArray

    variable: 0time: 5199

    array([], shape=(0, 5199), dtype=float64)

    Coordinates: (2)
    Attributes: (0)

I was expecting subtracting two identically-sized arrays to produce a third array of the same size, but seem to have ended up with an empty array. I wondered if the different column names might be having an effect, but couldn’t fix it:

leo = ds[['xLeo','yLeo','zLeo']].to_array().rename({'xLeo':'x','yLeo':'y','zLeo':'z'})

ValueError: cannot rename 'xLeo' because it is not a variable or dimension in this dataset

Could anyone point out where I’m going wrong?

Asked By: user2823789

||

Answers:

You’re on the right track, but don’t convert to array before renaming the variables. When you use to_array, the variables in the dataset get converted to a dimension in the result, so xLeo, yLeo, zLeo become labels along the new variable dimension, and rename doesn’t operate on them (as they are no longer the name of a dimension or coordinate).

Instead, rename to align the dataset variables and then subtract:

leo = ds[['xLeo','yLeo','zLeo']].rename({'xLeo':'x','yLeo':'y','zLeo':'z'})
gps = ds[['xGps','yGps','zGps']].rename({'xGps':'x','yGps':'y','zGps':'z'})

The following will now perform element-wise subtraction
between the arrays x, y, and z after broadcasting
them against one-another

leo - gps

If you do want these to be treated as dimensions in an array, you could use to_array after renaming:

# same as above, but returns a DataArray with an extra dim
#     variable (object): [x, y, z]
# instead of a Dataset
leo_arr = leo.to_array()
gps_arr = gps.to_array()

leo_arr - gps_arr

Generally, I’d recommend only doing math with DataArrays, especially until you really know your way around xarray, as Dataset math can behave unexpectedly when you’re not being really careful about how operations are broadcast and mapped. In this case, you’ve filtered your datasets to only include the specific variables you want to subtract, and are therefore sure you want to do this operation with all of the variables in the datasets. So I think either of these approaches are fine. A downside of to_array is that it has to re-allocate a new array with the extra dimension, so this can be memory intensive if the three arrays are large. The upside is you are then doing math with DataArrays, which is much more straightforward, especially for an xarray beginner.

Another advantage of converting to arrays is the ability to use operations and reductions over a dimension, e.g. xr.cross:

xr.cross(leo_arr, gps_arr, dim="variable")
Answered By: Michael Delgado
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.