Find max value for each year in 3D array NetCDF file (way to use Pandas or xarray?)

Question:

I am trying to makes some maps and such of data from several NetCDF files. Each one contains data for 5 years worth of data. The data is in a 3D array of shape (14608, 145, 192) (time, lat, lon).

I would like the maximum value for each year at each coordinate, so basically when it’s all said and done I’ll have an output array with shape (5,145,192) (one value per each lat. and lon. value).

It has been suggested I try using pandas, specifically DataFrame and DatetimeIndex, but I couldn’t find a way to use it for more anything greater than a 2D array. Xarray was also suggested, but I haven’t used xarray before and wouldn’t know where to start.

Edit 1: Sample Data

Here is a simplified version of what I’ve been trying to do with pandas and then I realized DataFrame doesn’t work for a 3D array.

import numpy as np
import pandas as pd

fake = np.random.randint(2, 30, size = (14608,145,192))
index = pd.date_range(start = '1985-1-1 01:30:00', end = '1989-12-31 22:30:00' , freq='3H')

df = pd.DataFrame(data = fake, index = index)

Edit 2: Fixed Listed Array Shape

To clarify, I actually want an array with shape (5, 145, 192) as the output. I wrote it wrong because originally I was splitting the 3D array into 5 separate arrays, finding the max, and then stacking them again into one array witch ended with a shape of (5, 145, 192).

I want to be able to skip the tedious breaking apart the array by hand, so to speak, that I was doing before and simplify the code.

Asked By: Alex Morrison

||

Answers:

You can using Panel here

df = pd.Panel(fake).to_frame()
df.columns=index
df
Out[1065]: 
             1985-01-01 01:30:00  1985-01-01 04:30:00  1985-01-01 07:30:00
major minor                                                               
0     0                       28                    7                   22
      1                        9                   10                   11
      2                        8                   15                    7
      3                       19                   18                    2
      4                       14                   16                   24
      5                        6                   26                   13
      6                       28                   16                   11

#....
Answered By: BENY

Here’s how you could approach this using Xarray:

import xarray as xr

# open one of your files
ds = xr.open_dataset('path/to/your/ncfile.nc')

# find maximum for a specific year (1990 in this example)
ds_ymax = ds.sel(time=slice('1990-01-01', '1990-12-31')).max('time')

# plot a single variable ('temperature' in this example)
ds_ymax['temperature'].plot()

While that covers the basics of what you’re trying to do, there are a few other common workflow things I figured I should mention:

  1. Open multiple files at once. Xarray provides a open_mfdataset function that allows for quick concatenation of multiple files at once:

    ds = xr.open_mfdataset('path/to/your/ncfiles/*nc')  # note the use of the wildcard
    
  2. Using resample to calculate annual maximum values. In my example above, I manually selected a single years worth of data but it is possible to do this programmatically using resample or groupby

    # using resample ('AS' == annual starting Jan-1)
    ds_ymax = ds.resample(time='AS').max('time')
    
    # using groupby
    ds_ymax = ds.groupby('time.year').max('time')
    

Finally, you mentioned not knowing where to start with xarray. Take a look at the documentation: http://xarray.pydata.org/en/latest/index.html

Answered By: jhamman

This is not a direct python solution, but if you want annual maximum (i.e. one value for each grid point per year) then you can do this from the command line with cdo:

cdo yearmax in.nc out.nc 

You can use these functions from within python by using the cdo package, installed with:

pip install cdo

Further details here: https://code.mpimet.mpg.de/projects/cdo/embedded/index.html

Answered By: Adrian Tompkins
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.