Create a raster from points (gpd.geodataframe object) in Python 3.6
Question:
I want to create a raster file (.tif) from a points file using a geopandas.geodataframe.GeoDataFrame
object.
My dataframe has two columns: [geometry] and [Value]. The goal is to make a 10m resolution raster in [geometry] point with the [Value] value.
My dataset is:
geometry | Value
0 | POINT (520595.000 5720335.000) | 536.678345
1 | POINT (520605.000 5720335.000) | 637.052185
2 | POINT (520615.000 5720335.000) | 1230.553955
3 | POINT (520625.000 5720335.000) | 944.970642
4 | POINT (520635.000 5720335.000) | 1094.613281
5 | POINT (520645.000 5720335.000) | 1123.185181
6 | POINT (520655.000 5720335.000) | 849.37634
7 | POINT (520665.000 5720335.000) | 1333.459839
8 | POINT (520675.000 5720335.000) | 492.866608
9 | POINT (520685.000 5720335.000) | 960.957214
10 | POINT (520695.000 5720335.000) | 539.401978
11 | POINT (520705.000 5720335.000) | 573.015625
12 | POINT (520715.000 5720335.000) | 970.386536
13 | POINT (520725.000 5720335.000) | 390.315094
14 | POINT (520735.000 5720335.000) | 642.036865
I have tried before and so, I know that with from geocube.api.core import make_geocube
I could do it, but due to some libraries I have a limitation and I cannot use make_geocube
.
Any idea?
Answers:
Assign x and y columns, convert to xarray, then export to tiff using rioxarray:
# do this before sending to xarray
# to ensure extension is loaded
import rioxarray
# assuming your GeoDataFrame is called `gdf`
gdf["x"] = gdf.x
gdf["y"] = gdf.y
da = (
gdf.set_index(["y", "x"])
.Value
.to_xarray()
)
da.rio.to_raster("myfile.tif")
In order for this to work, the points must make up a full regular grid, with values of x and y each repeated for each combination. If this is instead just a collection of arbitrary points converting to xarray with x and y as perpendicular dimensions will explode your memory and the result will be almost entirely NaNs.
If you don’t expect to have multiple points contained inside single pixels, then you could use rasterio.features.rasterize
for this. Keep in mind, however, that if you do have multiple pixels overlapping single pixels, then rasterio.features.rasterize
can only either:
- burn in a single hardcoded value,
- add the overlapping point values together, or
- simply use the last seen point value
In the case of rasterizing categorical values this should suffice, but for rasterizing continuous data you may instead want to take the average value per pixel.
For that, you could do something like the following:
import numpy as np
from typing import Union
def rasterize_points(
points: np.ndarray, res: Union[int, float], bbox: tuple
) -> np.ndarray:
"""Rasterize points into a grid with a given resolution.
Args:
points (np.ndarray): Points to rasterize, with columns (x, y, value) (for
geographic coordinates, use (lon, lat, value))
res (Union[int, float]): Resolution of the grid
bbox (tuple): Bounding box of the grid
Returns:
np.ndarray: Rasterized grid
"""
width = int((bbox[2] - bbox[0]) / res)
height = int((bbox[3] - bbox[1]) / res)
rast = np.zeros((height, width), dtype=np.float32)
count_array = np.zeros_like(rast)
for x, y, value in points:
col = int((x - bbox[0]) / res)
row = int((bbox[3] - y) / res)
rast[row, col] += value
count_array[row, col] += 1
# Avoid division by zero
count_array[count_array == 0] = 1
# Calculate the average
rast = rast / count_array
return rast
The above example isn’t very optimized, but it gets the job done. Note that it only returns a 2D numpy array. You’ll want to incorporate the relevant coordinates and other geospatial info (e.g. CRS and transform) using rasterio
or rioxarray
.
I want to create a raster file (.tif) from a points file using a geopandas.geodataframe.GeoDataFrame
object.
My dataframe has two columns: [geometry] and [Value]. The goal is to make a 10m resolution raster in [geometry] point with the [Value] value.
My dataset is:
geometry | Value
0 | POINT (520595.000 5720335.000) | 536.678345
1 | POINT (520605.000 5720335.000) | 637.052185
2 | POINT (520615.000 5720335.000) | 1230.553955
3 | POINT (520625.000 5720335.000) | 944.970642
4 | POINT (520635.000 5720335.000) | 1094.613281
5 | POINT (520645.000 5720335.000) | 1123.185181
6 | POINT (520655.000 5720335.000) | 849.37634
7 | POINT (520665.000 5720335.000) | 1333.459839
8 | POINT (520675.000 5720335.000) | 492.866608
9 | POINT (520685.000 5720335.000) | 960.957214
10 | POINT (520695.000 5720335.000) | 539.401978
11 | POINT (520705.000 5720335.000) | 573.015625
12 | POINT (520715.000 5720335.000) | 970.386536
13 | POINT (520725.000 5720335.000) | 390.315094
14 | POINT (520735.000 5720335.000) | 642.036865
I have tried before and so, I know that with from geocube.api.core import make_geocube
I could do it, but due to some libraries I have a limitation and I cannot use make_geocube
.
Any idea?
Assign x and y columns, convert to xarray, then export to tiff using rioxarray:
# do this before sending to xarray
# to ensure extension is loaded
import rioxarray
# assuming your GeoDataFrame is called `gdf`
gdf["x"] = gdf.x
gdf["y"] = gdf.y
da = (
gdf.set_index(["y", "x"])
.Value
.to_xarray()
)
da.rio.to_raster("myfile.tif")
In order for this to work, the points must make up a full regular grid, with values of x and y each repeated for each combination. If this is instead just a collection of arbitrary points converting to xarray with x and y as perpendicular dimensions will explode your memory and the result will be almost entirely NaNs.
If you don’t expect to have multiple points contained inside single pixels, then you could use rasterio.features.rasterize
for this. Keep in mind, however, that if you do have multiple pixels overlapping single pixels, then rasterio.features.rasterize
can only either:
- burn in a single hardcoded value,
- add the overlapping point values together, or
- simply use the last seen point value
In the case of rasterizing categorical values this should suffice, but for rasterizing continuous data you may instead want to take the average value per pixel.
For that, you could do something like the following:
import numpy as np
from typing import Union
def rasterize_points(
points: np.ndarray, res: Union[int, float], bbox: tuple
) -> np.ndarray:
"""Rasterize points into a grid with a given resolution.
Args:
points (np.ndarray): Points to rasterize, with columns (x, y, value) (for
geographic coordinates, use (lon, lat, value))
res (Union[int, float]): Resolution of the grid
bbox (tuple): Bounding box of the grid
Returns:
np.ndarray: Rasterized grid
"""
width = int((bbox[2] - bbox[0]) / res)
height = int((bbox[3] - bbox[1]) / res)
rast = np.zeros((height, width), dtype=np.float32)
count_array = np.zeros_like(rast)
for x, y, value in points:
col = int((x - bbox[0]) / res)
row = int((bbox[3] - y) / res)
rast[row, col] += value
count_array[row, col] += 1
# Avoid division by zero
count_array[count_array == 0] = 1
# Calculate the average
rast = rast / count_array
return rast
The above example isn’t very optimized, but it gets the job done. Note that it only returns a 2D numpy array. You’ll want to incorporate the relevant coordinates and other geospatial info (e.g. CRS and transform) using rasterio
or rioxarray
.