Reversing subdivision of latitude and longitude coordinates in xarray

Question:

I’m using Python to work with geospatial data in xarray, where locations are subdivided into subregions/tiles, similar (not the same) to the example at the bottom of the page of the xarray documentation for reshaping reorganising. I want to reverse this subdivision in xarray as efficient as possible, i.e. without extracting all data into e.g. pandas, unravelling the tiles-dimension there and building a new xarray Dataset from scratch.

The dimensions i and j mark the x and y dimensions for each tile and are just lists of integers. Latitude and longitude are supplied as a separate coordinate in the Dataset, but they’re multidimensional due to the tiles and can’t be easily swapped with i and j using swap_dims(). Before I explain my approach, here’s a working example how the Dataset looks like using random data:

EDIT: The way I constructed this example isn’t actually as I originally intended. It mistakenly creates a diagonal set of tiles, whereas I aimed for a rectangular-shaped set of tiles. JonasV’s answer adjusts my example to do that (see the comments and JonasV’s answer for more information)

import numpy as np
import xarray as xr

# create latitude and longitude data
number_of_tiles = 5
lon_lat_values_per_tile = 150
latitudes = np.linspace(-90, 90, (number_of_tiles * lon_lat_values_per_tile))
longitudes = np.linspace(-180, 180, (number_of_tiles * lon_lat_values_per_tile))
longitudes, latitudes = np.meshgrid(longitudes, latitudes)
# create latitude and longitude coordinates in the tiled shape
latitudes = np.expand_dims(latitudes, axis=0)
longitudes = np.expand_dims(longitudes, axis=0)

for tile in range(number_of_tiles):
    min_index = int((tile / number_of_tiles) * np.shape(longitudes)[2])
    max_index = int(((tile+1) / number_of_tiles) * np.shape(longitudes)[2])

    if tile == 0:
        longitudes_tiled = longitudes[:, min_index:max_index, min_index:max_index]
        latitudes_tiled = latitudes[:, min_index:max_index, min_index:max_index]
    else: 
        longitudes_tiled = np.concatenate((longitudes_tiled, longitudes[:, min_index:max_index, min_index:max_index]), axis=0)
        latitudes_tiled = np.concatenate((latitudes_tiled, latitudes[:, min_index:max_index, min_index:max_index]), axis=0)

# create dimensions
tiles = np.arange(0, number_of_tiles, 1)
i = np.arange(0, np.shape(longitudes_tiled)[2], 1)
j = np.arange(0, np.shape(latitudes_tiled)[1], 1)
# create data variable
random_data = np.random.rand(np.shape(longitudes_tiled)[0], np.shape(longitudes_tiled)[1], np.shape(longitudes_tiled)[2])

# create xarray Dataset
data_xr = xr.Dataset({"random_data": (("tiles", "i", "j"), random_data)},
                      coords={"tiles": tiles,
                              "i": i,
                              "j": j,
                              "longitudes_tiled": (("tiles", "i", "j"), longitudes_tiled),
                              "latitudes_tiled": (("tiles", "i", "j"), latitudes_tiled)})

Which results in the following Dataset:

Dimensions:           (tiles: 5, i: 150, j: 150)
Coordinates:
  * tiles             (tiles) int32 0 1 2 3 4
  * i                 (i) int32 0 1 2 3 4 5 6 7 ... 143 144 145 146 147 148 149
  * j                 (j) int32 0 1 2 3 4 5 6 7 ... 143 144 145 146 147 148 149
    longitudes_tiled  (tiles, i, j) float64 -180.0 -179.5 -179.0 ... 179.5 180.0
    latitudes_tiled   (tiles, i, j) float64 -90.0 -90.0 -90.0 ... 90.0 90.0 90.0
Data variables:
    random_data       (tiles, i, j) float64 0.6815 0.6691 ... 0.9713 0.6347

I’ve been using xarray.stack() to combine the tiles and either i or j, which creates a 150 x 750 Dataset (instead of 5 x 150 x 150) in the following way:

data_xr_stacked = data_xr.stack(tiles_i=("tiles", "i"))
# add longitudes along stacked dimension here
data_xr_stacked.coords["longitude"] = (("tiles_i"), longitudes[0, 0, :])
# swap dimensions to longitude
data_xr_stacked = data_xr_stacked.swap_dims({"tiles_i": "longitude"})

(with this example data I’m just re-using the longitudes-variable I created above. For the real dataset I’d unravel the longitudes stored in the dataset using the tiles_i-variable)
So this is exactly the behaviour I’m after – if it weren’t for the fact that this leaves out latitude. And because tiles are now stacked together with i I cannot re-use it to do the same with latitude. If I unstack tiles_i again, it’d break up into tiles again.

How do I manipulate this xarray Dataset to end up with an 750 x 750 array and latitude and longitude swapped out with tiles, i and j?

Asked By: Tobitobitobi

||

Answers:

So with this information, something like that will be a minimum example more like your use case:


import numpy as np
import xarray as xr

# create latitude and longitude data
number_of_tiles = 5
lon_lat_values_per_tile = 150
latitudes = np.linspace(-90, 90, (number_of_tiles * lon_lat_values_per_tile))
longitudes = np.linspace(-180, 180, (number_of_tiles * lon_lat_values_per_tile))
longitudes, latitudes = np.meshgrid(longitudes, latitudes)
# create latitude and longitude coordinates in the tiled shape
latitudes = np.expand_dims(latitudes, axis=0)
longitudes = np.expand_dims(longitudes, axis=0)

for tile in range(number_of_tiles):
    min_index = int((tile / number_of_tiles) * np.shape(longitudes)[2])
    max_index = int(((tile+1) / number_of_tiles) * np.shape(longitudes)[2])

    if tile == 0:
        longitudes_tiled = longitudes[:, min_index:max_index, min_index:max_index]
        latitudes_tiled = latitudes[:, min_index:max_index, min_index:max_index]
    else: 
        longitudes_tiled = np.concatenate((longitudes_tiled, longitudes[:, min_index:max_index, min_index:max_index]), axis=0)
        latitudes_tiled = np.concatenate((latitudes_tiled, latitudes[:, min_index:max_index, min_index:max_index]), axis=0)

# create dimensions
tiles = np.arange(0, number_of_tiles, 1)
i = np.arange(0, np.shape(longitudes_tiled)[2], 1)
j = np.arange(0, np.shape(latitudes_tiled)[1], 1)
# create data variable
random_data = np.random.rand(np.shape(longitudes_tiled)[0], np.shape(longitudes_tiled)[0],  np.shape(longitudes_tiled)[1], np.shape(longitudes_tiled)[2])

# create xarray Dataset
data_xr = xr.Dataset({"random_data": (("tiles_lat", "tiles_lon", "i", "j"), random_data)},
                      coords={"tiles_lat": tiles,
                              "tiles_lon": tiles,
                              "i": i,
                              "j": j,
                              "longitudes_tiled": (("tiles_lon", "i", "j"), longitudes_tiled),
                              "latitudes_tiled": (("tiles_lat", "i", "j"), latitudes_tiled)})

and with this you should be able to convert it to a 750×750 xarray something like this:

data_xr.stack(tiles_i=("tiles_lon", "i"),tiles_j=("tiles_lat", "j"))

Answered By: JonasV
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.