Issues resampling raster to the resolution of another raster

Question:

I am trying to take a population raster and resample+reproject it to match the shape and resolution of a precipitation raster.

Data Links:

Population Data: https://figshare.com/ndownloader/files/10257111

Precipitation Data: https://www.ncei.noaa.gov/data/nclimgrid-monthly/access/nclimgrid_prcp.nc

The Population Data is a series of rasters per decade of 5 different population models covering the continental US. If you simply select one of the rasters I can work out the rest (I have combined into a multiband raster anyways). For example if you use the pop_m4_2010 raster that would help. The resolution is 1x1km, and the projection is Albers Equal Area Conic NAD 83 ESRI:102003.

The Precipitation Data is a netcdf file covering monthly precipitation data for the continental US. The resolution is 5x5km and the projection is WGS84 EPSG:4326.

I converted the netcdf to tiff using the following code:

import xarray as xr 
import rioxarray as rio
prcp_file = xr.open_dataset('nclimgrid_prcp.nc')
prp = prcp_file['prcp']
prp = prp.rio.set_spatial_dims(x_dim='lon', y_dim='lat')
prp.rio.write_crs("epsg:4326", inplace=True)
prp.rio.to_raster('prp_raster.tiff')

I also used QGIS to open the population files (add raster layer, navigate into the downloaded folder for pop_m4_2010 and select the "w001001.adf" file). When I do this in a WGS84 project QGIS automatically appears to force reprojection but I am new to this so I am unsure if it is correct.

From this point I have tried several things to resample the population raster to match the 5×5 resolution of the precipitation raster.

  1. In QGIS Processing Toolbox GRASS r.resample
  2. In QGIS Processing Toolbox Raster Layer Zonal Statistics
  3. In Python, honestly I have lost track of all of the different forum posts and tutorials I have followed on GDAL.Warp, Rasterio.Warp, affine transformations, rio.reproject_match, etc. Below are a few examples of the code attempts.

Many of these appear to work (particularly the rio.reproject_match seemed simple and effective). However, none of these appear to be working as intended. When I test the accuracy of the resulting population raster by passing zonal stats of a county vector shapefile the resulting sum of population in the area is either 0, or wildly inaccurate.

What am I doing wrong?

Reproject_Match:

import rioxarray # for the extension to load
import xarray

import matplotlib.pyplot as plt

%matplotlib inline

def print_raster(raster):
    print(
        f"shape: {raster.rio.shape}n"
        f"resolution: {raster.rio.resolution()}n"
        f"bounds: {raster.rio.bounds()}n"
        f"sum: {raster.sum().item()}n"
        f"CRS: {raster.rio.crs}n"
    )

xds = rioxarray.open_rasterio('pop_m4_2010.tif')
xds_match = rioxarray.open_rasterio('prp_raster.tiff')

fig, axes = plt.subplots(ncols=2, figsize=(12,4))
xds.plot(ax=axes[0])
xds_match.plot(ax=axes[1])
plt.draw()

print("Original Raster:n----------------n")
print_raster(xds)
print("Raster to Match:n----------------n")
print_raster(xds_match)

xds_repr_match = xds.rio.reproject_match(xds_match)

print("Reprojected Raster:n-------------------n")
print_raster(xds_repr_match)
print("Raster to Match:n----------------n")
print_raster(xds_match)

xds_repr_match.rio.to_raster("reproj_pop.tif")

Another way with Rasterio.Warp:

import rasterio
from rasterio.warp import calculate_default_transform, reproject, Resampling

#open source raster
srcRst =rasterio.open('pop_m4_2010.tif') 
print("source raster crs:")
print(srcRst.crs)

dstCrs = {'init': 'EPSG:4326'}
print("destination raster crs:")
print(dstCrs)

#calculate transform array and shape of reprojected raster
transform, width, height = calculate_default_transform(
        srcRst.crs, dstCrs, srcRst.width, srcRst.height, *srcRst.bounds)

print("transform array of source raster")
print(srcRst.transform)

print("transform array of destination raster")
print(transform)

#working of the meta for the destination raster
kwargs = srcRst.meta.copy()
kwargs.update({
        'crs': dstCrs,
        'transform': transform,
        'width': width,
        'height': height
    })

#open destination raster
dstRst = rasterio.open('pop_m4_2010_reproj4326.tif', 'w', **kwargs)

#reproject and save raster band data
for i in range(1, srcRst.count + 1):
    reproject(
        source=rasterio.band(srcRst, i),
        destination=rasterio.band(dstRst, i),
        #src_transform=srcRst.transform,
        src_crs=srcRst.crs,
        #dst_transform=transform,
        dst_crs=dstCrs,
        resampling=Resampling.bilinear)
    print(i)
    
#close destination raster
dstRst.close()

And here is a second attempt with Rasterio.Warp:

import rasterio
from rasterio.warp import calculate_default_transform, reproject, Resampling

prcp = rasterio.open('prp_raster.tiff', mode = 'r')

with rasterio.open('pop_m4_2010.tif') as dataset:
    # resample data to target shape
    data = dataset.read(out_shape=(dataset.count,prcp.height,prcp.width), resampling=Resampling.bilinear)

    # scale image transform
    transform = dataset.transform * dataset.transform.scale((dataset.width / data.shape[-1]),
                                                            (dataset.height / data.shape[-2]))
    
    # Register GDAL format drivers and configuration options with a
    # context manager.
    with rasterio.Env():

        profile = src.profile

        profile.update(
            dtype=rasterio.float32,
            count=1,
            compress='lzw')

        with rasterio.open('pop_m4_2010_resampledtoprcp.tif', 'w', **profile) as dst:
            dst.write(data.astype(rasterio.float32))
Asked By: KingInioch

||

Answers:

This is how you can do that with R.

library(terra)
pop <- rast("USA_HistoricalPopulationDataset/pop_m5_2010")
wth <- rast("nclimgrid_prcp.nc")

wpop <- project(pop, wth, "sum")

Inspect the results.

wpop
#class       : SpatRaster 
#dimensions  : 596, 1385, 1  (nrow, ncol, nlyr)
#resolution  : 0.04166666, 0.04166667  (x, y)
#extent      : -124.7083, -67, 24.5417, 49.37503  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 
#source(s)   : memory
#name        : pop_m5_2010 
#min value   :         0.0 
#max value   :    423506.7 

global(pop, "sum", na.rm=TRUE)
#                  sum
#pop_m5_2010 306620886

global(wpop, "sum", na.rm=TRUE)
#                  sum
#pop_m5_2010 306620761

You can save the results to file with something like this

writeRaster(wpop, "pop.tif")

And you could do this in one step for all population data like this:

ff <- list.files(pattern="0$", "USA_HistoricalPopulationDataset", full=TRUE)
apop <- rast(ff)
wapop <- project(apop, wth, "sum")

The population numbers you are getting are probably wrong because you are using bilinear interpolation when projecting (warping). That is not appropriate for (population) count data. You could first transform it to population density, warp, and transform back. I do that below, getting a result that is similar to what you get with the more direct approach that I have shown above.

csp <- cellSize(pop)
csw <- cellSize(wth[[1]])
popdens <- pop / csp
popdens <- project(popdens, wth, "bilinear")
popcount <- popdens * csw

popcount
#class       : SpatRaster 
#dimensions  : 596, 1385, 1  (nrow, ncol, nlyr)
#resolution  : 0.04166666, 0.04166667  (x, y)
#extent      : -124.7083, -67, 24.5417, 49.37503  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 
#source(s)   : memory
#name        : pop_m5_2010 
#min value   :         0.0 
#max value   :    393982.5 

global(popcount, "sum", na.rm=TRUE)
#                  sum
#pop_m5_2010 304906042
Answered By: Robert Hijmans