How to add a constant to negative values in array

Question:

Given the xarray below, I would like to add 10 to all negative values (i.e, -5 becomes 5, -4 becomes 6 … -1 becomes 9, all values remain unchanged).

a = xr.DataArray(np.arange(25).reshape(5, 5)-5, dims=("x", "y"))

I tried:

  • a[a<0]=10+a[a<0], but it returns 2-dimensional boolean indexing is not supported.
  • Several attempts with a.where, but it seems that the other argument can only replace the mapped values with a constant rather than with indexed values.

I also considered using numpy as suggested here, but my actual dataset is ~ 80 Gb and loaded with dask and using numpy crashes my Jupyter console.

Is there any way to achieve this using only xarray?

Update

I updated the code using @SpaceBurger and this. However my initial example was using a DataArray whereas my true problem is using a Dataset:

a = xr.DataArray(np.arange(25).reshape(5, 5)-5, dims=("x", "y"))
a = a.to_dataset(name='variable')

Now, if I do this:

a1 = a['variable']
a2 = 10+a1.copy()
a['variable'] = dask.array.where(a['variable'] < 0, a2, a1)

I get this error:

MissingDimensionsError: cannot set variable 'variable' with 2-dimensional data without explicit dimension names. Pass a tuple of (dims, data) instead.

Can anyone suggest a proper syntax?

Asked By: e5k

||

Answers:

My best guess is based on my meagre understanding of these libraries, and especially the xarray.Dataset.update section of the xarray doc. This says the signature of xarray.Dataset.update parameters should be mapping {var name: (tuple of dimension names, array-like)}.

This means that datasets expect you to give them the name of the coordinates the data is attached to. And this makes some sense, as you should be able to see the name or the coordinates used in your dataset when printing the object (print(a)). Printing the dataset should give you the name of the coordinates generated with the call to a.to_dataset. Let us say they are named coord_x and coord_y. You should be able to set your data variable with

a['variable'] = (('coord_x', 'coord_y'), dask.array.where(a['variable'] < 0, a2, a1))

Which should be equivalent to

a.update('variable': (('coord_x', 'coord_y'), dask.array.where(a['variable'] < 0, a2, a1))

Or maybe the following that makes it easier to read but it doesn’t use dask so it might not be as efficient.

a.assign(variable=lambda x: x.variable+10 if x.variable<0 else x.variable)

So, to summarize you should be able to do the following :

>>> a = xr.DataArray(np.arange(25).reshape(5, 5)-5, dims=("x", "y")) # 2D data with dims X and Y
>>> a = a.to_dataset(name='variable') # Should create a dataset with X and Y coordinates

>>> print(a) # Make sure X and Y are the correct names generated by the previous call
<xarray.Dataset>
Dimensions:        (X: 2, Y: 2)
Coordinates:
  * X            (X) int64 10 20 ...
  * Y            (Y) int64 150 160 ...
Data variables:
  * variable     (X,Y) int64 1 2 3 4 5 ...

>>> a1 = a['variable']
>>> a2 = 10+a1.copy()
>>> a['variable'] = (('X', 'Y'), dask.array.where(a['variable'] < 0, a2, a1))
Answered By: SpaceBurger

xarray’s where method is the way to go here – you can provide any other argument which can be broadcast against the condition argument and the original array:

a['variable'] = a['variable'].where(
    a['variable'] >= 0,
    (a['variable'] + 10),
)

This will work fine with dask and will handle your coordinates seamlessly.

Note that if you do try this with a dataset, all of the variables in the dataset will be broadcast against the condition and other. so if you have some data vars that don’t include all these dimensions they’ll end up being repeated and weird. generally I recommend doing math/operations on DataArrays or variables as I have it in my answer.

Answered By: Michael Delgado
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.