Basic addition/subtraction on xarray columns
Question:
I’m trying to perform really basic vector operations on data imported through xarray, but having a really difficult time figuring out how to do it. I’ve got x, y, and z coordinates of two objects, and I’m needing to perform vector addition and subtraction on these as well as compute the cross product. Here’s my attempt:
ds = xr.open_dataset(data_path)
leo = ds[['xLeo','yLeo','zLeo']].to_array()
gps = ds[['xGps','yGps','zGps']].to_array()
example = gps - leo
This produces:
xarray.DataArray
variable: 0time: 5199
array([], shape=(0, 5199), dtype=float64)
Coordinates: (2)
Attributes: (0)
I was expecting subtracting two identically-sized arrays to produce a third array of the same size, but seem to have ended up with an empty array. I wondered if the different column names might be having an effect, but couldn’t fix it:
leo = ds[['xLeo','yLeo','zLeo']].to_array().rename({'xLeo':'x','yLeo':'y','zLeo':'z'})
ValueError: cannot rename 'xLeo' because it is not a variable or dimension in this dataset
Could anyone point out where I’m going wrong?
Answers:
You’re on the right track, but don’t convert to array before renaming the variables. When you use to_array
, the variables in the dataset get converted to a dimension in the result, so xLeo, yLeo, zLeo
become labels along the new variable
dimension, and rename
doesn’t operate on them (as they are no longer the name of a dimension or coordinate).
Instead, rename to align the dataset variables and then subtract:
leo = ds[['xLeo','yLeo','zLeo']].rename({'xLeo':'x','yLeo':'y','zLeo':'z'})
gps = ds[['xGps','yGps','zGps']].rename({'xGps':'x','yGps':'y','zGps':'z'})
The following will now perform element-wise subtraction
between the arrays x
, y
, and z
after broadcasting
them against one-another
leo - gps
If you do want these to be treated as dimensions in an array, you could use to_array
after renaming:
# same as above, but returns a DataArray with an extra dim
# variable (object): [x, y, z]
# instead of a Dataset
leo_arr = leo.to_array()
gps_arr = gps.to_array()
leo_arr - gps_arr
Generally, I’d recommend only doing math with DataArrays, especially until you really know your way around xarray, as Dataset math can behave unexpectedly when you’re not being really careful about how operations are broadcast and mapped. In this case, you’ve filtered your datasets to only include the specific variables you want to subtract, and are therefore sure you want to do this operation with all of the variables in the datasets. So I think either of these approaches are fine. A downside of to_array
is that it has to re-allocate a new array with the extra dimension, so this can be memory intensive if the three arrays are large. The upside is you are then doing math with DataArrays, which is much more straightforward, especially for an xarray beginner.
Another advantage of converting to arrays is the ability to use operations and reductions over a dimension, e.g. xr.cross:
xr.cross(leo_arr, gps_arr, dim="variable")
I’m trying to perform really basic vector operations on data imported through xarray, but having a really difficult time figuring out how to do it. I’ve got x, y, and z coordinates of two objects, and I’m needing to perform vector addition and subtraction on these as well as compute the cross product. Here’s my attempt:
ds = xr.open_dataset(data_path)
leo = ds[['xLeo','yLeo','zLeo']].to_array()
gps = ds[['xGps','yGps','zGps']].to_array()
example = gps - leo
This produces:
xarray.DataArray
variable: 0time: 5199
array([], shape=(0, 5199), dtype=float64)
Coordinates: (2)
Attributes: (0)
I was expecting subtracting two identically-sized arrays to produce a third array of the same size, but seem to have ended up with an empty array. I wondered if the different column names might be having an effect, but couldn’t fix it:
leo = ds[['xLeo','yLeo','zLeo']].to_array().rename({'xLeo':'x','yLeo':'y','zLeo':'z'})
ValueError: cannot rename 'xLeo' because it is not a variable or dimension in this dataset
Could anyone point out where I’m going wrong?
You’re on the right track, but don’t convert to array before renaming the variables. When you use to_array
, the variables in the dataset get converted to a dimension in the result, so xLeo, yLeo, zLeo
become labels along the new variable
dimension, and rename
doesn’t operate on them (as they are no longer the name of a dimension or coordinate).
Instead, rename to align the dataset variables and then subtract:
leo = ds[['xLeo','yLeo','zLeo']].rename({'xLeo':'x','yLeo':'y','zLeo':'z'})
gps = ds[['xGps','yGps','zGps']].rename({'xGps':'x','yGps':'y','zGps':'z'})
The following will now perform element-wise subtraction
between the arrays x
, y
, and z
after broadcasting
them against one-another
leo - gps
If you do want these to be treated as dimensions in an array, you could use to_array
after renaming:
# same as above, but returns a DataArray with an extra dim
# variable (object): [x, y, z]
# instead of a Dataset
leo_arr = leo.to_array()
gps_arr = gps.to_array()
leo_arr - gps_arr
Generally, I’d recommend only doing math with DataArrays, especially until you really know your way around xarray, as Dataset math can behave unexpectedly when you’re not being really careful about how operations are broadcast and mapped. In this case, you’ve filtered your datasets to only include the specific variables you want to subtract, and are therefore sure you want to do this operation with all of the variables in the datasets. So I think either of these approaches are fine. A downside of to_array
is that it has to re-allocate a new array with the extra dimension, so this can be memory intensive if the three arrays are large. The upside is you are then doing math with DataArrays, which is much more straightforward, especially for an xarray beginner.
Another advantage of converting to arrays is the ability to use operations and reductions over a dimension, e.g. xr.cross:
xr.cross(leo_arr, gps_arr, dim="variable")