How can I create a multiindex matrix from a 4D xarray dataset?

Question:

I currently have a 4D dataset ds in xarray that looks like this:

<xarray.Dataset>
Dimensions:  (lat: 60, lon: 78, time: 216, pres: 395)
Coordinates:
  * lat      (lat) float32 0.5 1.5 2.5 3.5 4.5 5.5 ... 55.5 56.5 57.5 58.5 59.5
  * lon      (lon) float32 -45.5 -44.5 -43.5 -42.5 ... -69.5 -75.5 -74.5 -76.5
  * time     (time) float32 7.32e+05 7.32e+05 7.32e+05 ... 7.385e+05 7.385e+05
  * pres     (pres) float64 2.5 7.5 12.5 17.5 ... 1.962e+03 1.968e+03 1.972e+03
Data variables:
    var       (pres, lat, lon, time) float64 2.03e+03 2.03e+03 ... nan nan>

My goal is to turn it into a pandas df that looks like this:

id   time  pres param  20.5-70.5  20.5-71.5  20.5-72.5
0     0     0   var       2085       2073       2057
1     0     1   var       2114       2156       2054
2     0     2   var       2039       2006       2179
3     1     0   var       2199       2144       2033
4     1     1   var       2056       2102       2191
5     1     2   var       2062       2033       2052
6     2     0   var       2001       2153       2170
7     2     1   var       2187       2120       2100
8     2     2   var       2138       2076       2002

where I have some sort of multi-index with time and pres, a param column (because I will likely have more than one variable at a time), and each pixel (so paired lat-lon) as a column header, so that for each pixel column I have the var value corresponding to the time and pres. I need this format for the next part of my analysis that includes some vectorization.

I tried a few things, including stacked = ds.stack(coordinates=["lat", "lon"]) which I think is the start of what I want to do, to then do stacked.to_dataframe() but then the latter unstacks my paired coordinates. I think I’m missing something here but I’m not too sure how to go about this?

Any help is greatly appreciated!

Thanks

Asked By: ldlg

||

Answers:

(Since you don’t provide a sample, you should adapt this example)

Use stack/unstack to reshape your dataset:

import xarray as xr
import pandas as pd

ds = xr.tutorial.load_dataset('air_temperature')
df = ds.to_dataframe().rename_axis(columns='param').stack('param').unstack(['lat', 'lon'])
df.columns = [f"{lat}-{lon}" for lat, lon in df.columns]

Output:

>>> df
                           75.0-200.0  75.0-202.5  75.0-205.0  75.0-207.5  ...  15.0-322.5  15.0-325.0  15.0-327.5  15.0-330.0
time                param                                                  ...                                                
2013-01-01 00:00:00 air    241.199997  242.500000  243.500000  244.000000  ...  297.600006  296.899994  296.790009  296.600006
2013-01-01 06:00:00 air    242.099991  242.699997  243.099991  243.389999  ...  296.899994  296.399994  296.399994  296.600006
2013-01-01 12:00:00 air    242.299988  242.199997  242.299988  242.500000  ...  297.600006  297.000000  297.000000  296.790009
2013-01-01 18:00:00 air    241.889999  241.799988  241.799988  242.099991  ...  298.199982  297.790009  298.000000  297.899994
2013-01-02 00:00:00 air    243.199997  243.099991  243.099991  243.299988  ...  297.699982  297.100006  297.399994  297.399994
...                               ...         ...         ...         ...  ...         ...         ...         ...         ...
2014-12-30 18:00:00 air    243.089996  243.389999  243.689987  243.789993  ...  297.989990  297.389984  296.889984  296.089996
2014-12-31 00:00:00 air    242.489990  242.389999  242.189987  241.689987  ...  297.290009  296.589996  295.989990  295.489990
2014-12-31 06:00:00 air    243.489990  242.989990  242.089996  240.689987  ...  297.089996  296.089996  295.790009  295.790009
2014-12-31 12:00:00 air    245.789993  244.789993  243.489990  241.889999  ...  296.589996  295.690002  295.489990  295.190002
2014-12-31 18:00:00 air    245.089996  244.289993  243.289993  242.189987  ...  297.190002  296.489990  296.190002  295.690002

[2920 rows x 1325 columns]

>>> ds
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 241.2 242.5 243.5 ... 296.5 296.2 295.7
Answered By: Corralien