How to create and return a Zarr file from xarray Dataset?

Question:

How would I go about creating and return a file new_zarr.zarr from a xarray Dataset?

I know xarray.Dataset.to_zarr() exists but this returns a ZarrStore and I must return a bytes-like object.

I have tried using the tempfile module but am unsure how to proceed, how would I write an xarray.Dataset to a bytes-like object that reurns a .zarr file that can be downloaded?

Asked By: A B

||

Answers:

Zarr supports multiple storage backends (DirectoryStore, ZipStore, etc.). If you are looking for a single file object, it sounds like the ZipStore is what you want.

import xarray as xr
import zarr

ds = xr.tutorial.open_dataset('air_temperature')
store = zarr.storage.ZipStore('./new_zarr.zip')
ds.to_zarr(store)

The zip file can be thought of as a single file zarr store and can be downloaded (or moved around as a single store).


Update 1

If you want to do this all in memory, you could extend zarr.ZipStore to allow passing in a BytesIO object:

class MyZipStore(zarr.ZipStore):
    
    def __init__(self, path, compression=zipfile.ZIP_STORED, allowZip64=True, mode='a',
                 dimension_separator=None):

        # store properties
        if isinstance(path, str):  # this is the only change needed to make this work
            path = os.path.abspath(path)
        self.path = path
        self.compression = compression
        self.allowZip64 = allowZip64
        self.mode = mode
        self._dimension_separator = dimension_separator

        # Current understanding is that zipfile module in stdlib is not thread-safe,
        # and so locking is required for both read and write. However, this has not
        # been investigated in detail, perhaps no lock is needed if mode='r'.
        self.mutex = RLock()

        # open zip file
        self.zf = zipfile.ZipFile(path, mode=mode, compression=compression,
                                  allowZip64=allowZip64)

Then you can create the create the zip file in memory:

zip_buffer = io.BytesIO()

store = MyZipStore(zip_buffer)

ds.to_zarr(store)

You’ll notice that the zip_buffer contains a valid zip file:

zip_buffer.read(10)
b'PKx03x04x14x00x00x00x00x00'

(PKx03x04 is the Zip file magic number)

Answered By: jhamman