Faster way of polygon intersection with shapely
Question:
I have a large number of polygons (~100000) and try to find a smart way of calculating their intersecting area with a regular grid cells.
Currently, I am creating the polygons and the grid cells using shapely (based on their corner coordinates). Then, using a simple for-loop I go through each polygon and compare it to nearby grid cells.
Just a small example to illustrate the polygons/grid cells.
from shapely.geometry import box, Polygon
# Example polygon
xy = [[130.21001, 27.200001], [129.52, 27.34], [129.45, 27.1], [130.13, 26.950001]]
polygon_shape = Polygon(xy)
# Example grid cell
gridcell_shape = box(129.5, -27.0, 129.75, 27.25)
# The intersection
polygon_shape.intersection(gridcell_shape).area
(BTW: the grid cells have the dimensions 0.25×0.25 and the polygons 1×1 at max)
Actually this is quite fast for an individual polygon/grid cell combo with around 0.003 seconds. However, running this code on a huge amount of polygons (each one could intersect dozens of grid cells) takes around 15+ minutes (up to 30+ min depending on the number of intersecting grid cells) on my machine which is not acceptable. Unfortunately, I have no idea how it is possible to write a code for polygon intersection to get the area of overlap. Do you have any tips? Is there an alternative to shapely?
Answers:
Consider using Rtree to help identify which grid cells that a polygon may intersect. This way, you can remove the for loop used with the array of lat/lons, which is probably the slow part.
Structure your code something like this:
from shapely.ops import cascaded_union
from rtree import index
idx = index.Index()
# Populate R-tree index with bounds of grid cells
for pos, cell in enumerate(grid_cells):
# assuming cell is a shapely object
idx.insert(pos, cell.bounds)
# Loop through each Shapely polygon
for poly in polygons:
# Merge cells that have overlapping bounding boxes
merged_cells = cascaded_union([grid_cells[pos] for pos in idx.intersection(poly.bounds)])
# Now do actual intersection
print(poly.intersection(merged_cells).area)
Since 2013/2014 Shapely has
STRtree. I have used it and it seems to work well.
Here is a snippet from the docstring:
STRtree is an R-tree that is created using the Sort-Tile-Recursive
algorithm. STRtree takes a sequence of geometry objects as initialization
parameter. After initialization the query method can be used to make a
spatial query over those objects.
>>> from shapely.geometry import Polygon
>>> from shapely.strtree import STRtree
>>> polys = [Polygon(((0, 0), (1, 0), (1, 1))), Polygon(((0, 1), (0, 0), (1, 0))), Polygon(((100, 100), (101, 100), (101, 101)))]
>>> s = STRtree(polys)
>>> query_geom = Polygon([(-1, -1), (2, 0), (2, 2), (-1, 2)])
>>> result = s.query(query_geom)
>>> polys[0] in result
True
Here is another version of the answer but with more control over IoU
```
def merge_intersecting_polygons(list_of_polygons, image_width, image_height):
"""
merge intersecting polygons with shapely library
merge only if Intersection over Union is greater than 0.5
speed up with STRTree
"""
# create shapely polygons
shapely_polygons = []
for polygon in list_of_polygons:
shapely_polygons.append(Polygon(polygon))
# create STRTree
tree = STRtree(shapely_polygons)
# merge polygons
merged_polygons = []
for i, polygon in enumerate(shapely_polygons):
# find intersecting polygons
intersecting_polygons = tree.query(polygon)
# merge intersecting polygons
for intersecting_polygon in intersecting_polygons:
if polygon != intersecting_polygon:
# compute intersection over union
intersection = polygon.intersection(intersecting_polygon).area
union = polygon.union(intersecting_polygon).area
iou = intersection/union
if iou > 0.5:
# merge polygons
polygon = polygon.union(intersecting_polygon)
# add merged polygon to list
merged_polygons.append(polygon)
# remove duplicates
merged_polygons = list(set(merged_polygons))
# convert shapely polygons to list of polygons
list_of_polygons = []
for polygon in merged_polygons:
list_of_polygons.append(np.array(polygon.exterior.coords).tolist())
return list_of_polygons
```
I have a large number of polygons (~100000) and try to find a smart way of calculating their intersecting area with a regular grid cells.
Currently, I am creating the polygons and the grid cells using shapely (based on their corner coordinates). Then, using a simple for-loop I go through each polygon and compare it to nearby grid cells.
Just a small example to illustrate the polygons/grid cells.
from shapely.geometry import box, Polygon
# Example polygon
xy = [[130.21001, 27.200001], [129.52, 27.34], [129.45, 27.1], [130.13, 26.950001]]
polygon_shape = Polygon(xy)
# Example grid cell
gridcell_shape = box(129.5, -27.0, 129.75, 27.25)
# The intersection
polygon_shape.intersection(gridcell_shape).area
(BTW: the grid cells have the dimensions 0.25×0.25 and the polygons 1×1 at max)
Actually this is quite fast for an individual polygon/grid cell combo with around 0.003 seconds. However, running this code on a huge amount of polygons (each one could intersect dozens of grid cells) takes around 15+ minutes (up to 30+ min depending on the number of intersecting grid cells) on my machine which is not acceptable. Unfortunately, I have no idea how it is possible to write a code for polygon intersection to get the area of overlap. Do you have any tips? Is there an alternative to shapely?
Consider using Rtree to help identify which grid cells that a polygon may intersect. This way, you can remove the for loop used with the array of lat/lons, which is probably the slow part.
Structure your code something like this:
from shapely.ops import cascaded_union
from rtree import index
idx = index.Index()
# Populate R-tree index with bounds of grid cells
for pos, cell in enumerate(grid_cells):
# assuming cell is a shapely object
idx.insert(pos, cell.bounds)
# Loop through each Shapely polygon
for poly in polygons:
# Merge cells that have overlapping bounding boxes
merged_cells = cascaded_union([grid_cells[pos] for pos in idx.intersection(poly.bounds)])
# Now do actual intersection
print(poly.intersection(merged_cells).area)
Since 2013/2014 Shapely has
STRtree. I have used it and it seems to work well.
Here is a snippet from the docstring:
STRtree is an R-tree that is created using the Sort-Tile-Recursive
algorithm. STRtree takes a sequence of geometry objects as initialization
parameter. After initialization the query method can be used to make a
spatial query over those objects.
>>> from shapely.geometry import Polygon
>>> from shapely.strtree import STRtree
>>> polys = [Polygon(((0, 0), (1, 0), (1, 1))), Polygon(((0, 1), (0, 0), (1, 0))), Polygon(((100, 100), (101, 100), (101, 101)))]
>>> s = STRtree(polys)
>>> query_geom = Polygon([(-1, -1), (2, 0), (2, 2), (-1, 2)])
>>> result = s.query(query_geom)
>>> polys[0] in result
True
Here is another version of the answer but with more control over IoU
```
def merge_intersecting_polygons(list_of_polygons, image_width, image_height):
"""
merge intersecting polygons with shapely library
merge only if Intersection over Union is greater than 0.5
speed up with STRTree
"""
# create shapely polygons
shapely_polygons = []
for polygon in list_of_polygons:
shapely_polygons.append(Polygon(polygon))
# create STRTree
tree = STRtree(shapely_polygons)
# merge polygons
merged_polygons = []
for i, polygon in enumerate(shapely_polygons):
# find intersecting polygons
intersecting_polygons = tree.query(polygon)
# merge intersecting polygons
for intersecting_polygon in intersecting_polygons:
if polygon != intersecting_polygon:
# compute intersection over union
intersection = polygon.intersection(intersecting_polygon).area
union = polygon.union(intersecting_polygon).area
iou = intersection/union
if iou > 0.5:
# merge polygons
polygon = polygon.union(intersecting_polygon)
# add merged polygon to list
merged_polygons.append(polygon)
# remove duplicates
merged_polygons = list(set(merged_polygons))
# convert shapely polygons to list of polygons
list_of_polygons = []
for polygon in merged_polygons:
list_of_polygons.append(np.array(polygon.exterior.coords).tolist())
return list_of_polygons
```