Python : Replacing Values in netcdf file using netCDF4

Question:

I have a netcdf file with several values < 0. I would like to replace all of them with a single value (say -1). How do I do that using netCDF4? I am reading in the file like this:

import netCDF4

dset      = netCDF4.Dataset('test.nc')
dset[dset.variables['var'] < 0] = -1
Asked By: user308827

||

Answers:

If you want to keep the data in the netCDF variable object, this should work:

import netCDF4

dset = netCDF4.Dataset('test.nc', 'r+')

dset['var'][:][dset['var'][:] < 0] = -1

dset.close() # if you want to write the variable back to disk

If you don’t want to write back to disk, go ahead and just get the numpy array and slice/assign to it:

data = dset['sea_ice_cover'][:]  # data is a numpy array
data[data < 0] = -1
Answered By: jhamman

For me, the previous answer does not work, I solved it with:

dset = netCDF4.Dataset('test.nc','r+')
dset.variables['var'][:]
... your changes ...
dset.close() 
Answered By: Adrien Bax

Solution 1: Python xarray

This solution uses xarray to read and write the netcdf file, and the package’s function where to conditionally reset the values.

import xarray as xr
ds=xr.open_dataset('test.nc')
ds['var']=xr.where((ds['var']<0),-1,ds['var'])
ds.to_netcdf('modified_test.nc') # rewrite to netcdf

Solution 2: NCO from the command line

I know the OP wants a python solution, but in case anyone wants to perform this task only quickly from the command line, there is also a way to do it with nco:

ncap2 -s 'where(x<0.) x=-1;' input.nc -O output.nc

as per this post: setting values below a threshold to the threshold in a netcdf file

Solution 3: CDO from the command line

cdo also has expression functionality to allow you to do this in a one-liner from the command line too.

cdo -expr,'var = ((var < 0)) ? -1 : var' infile.nc outfile.nc

there is a cdo package in python which allows you to use this functionality directly within python without resorting to sys.

Answered By: Adrian Tompkins

To enable conditional calculations with an equation instead of only calculating with a constant I have included a conditional iteration for a variable with shape of (month,lats,lons) based on the code by @jhamman as follows:

import netCDF4 as nc
import numpy as np
import time

Tmin = -1.7
Tmax = 4.9
perc = (Tmax-Tmin)/100

lats = np.arange(0,384,1)
lons = np.arange(0,768,1)
months = [0,1]
dset = nc.Dataset('path/file.nc', 'r+')

start = time.time()
dset['var'][:][dset['var'][:] < Tmin] = 100
step1 = time.time()
print('Step1 took: ' + str(step1-start))
dset['var'][:][dset['var'][:] > Tmax] = 0
step2 = time.time()
print('Step2 took: ' + str(step2 - step1))

#start iteration of each dimension to alter individual values according to equation new_value = 100-((Old_value +1.8)/1%)
for m in months:
    newstart = time.time()
    for i in lats:
        step3 = time.time()
        print('month lats lat layer '+str(i)+' took: '+str(step3-newstart) +'s')
        for j in lons:
            if dset['var'][m,i,j] < Tmax and dset['var'][m,i,j] > Tmin:
                dset['var'][m,i,j] = 100-((dset['var'][m,i,j]+1.8)/perc)       

     end = time.time()
     print('One full month took: ' + str(end-start) +'s')  

dset.close() 

The problem however is that it becomes a very slow code.

Step1 took: 0.0343s
Step2 took: 0.0253s
month lats lat layer: 0.4064s
One full month took 250.8082s

This is logic due to the iterations. I was wondering however if any of you have the idea how to speed this up a bit. Are the iterations really necessary for this goal?

Answered By: Linda
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.