Efficiently merge GeoDataFrames if Polygon from one contains Point from second

Question:

i have two GeoDataFrames
gdf_point:

       Unnamed: 0   latitude  longitude                  geometry
0               0  50.410203   7.236583  POINT (7.23658 50.41020)
1               1  51.303545   7.263082  POINT (7.26308 51.30354)
2               2  50.114965   8.672785  POINT (8.67278 50.11496)

and gdf_poly:

       Unnamed: 0  Id                                       geometry
0               0  301286  POLYGON ((9.67079 49.86762, 9.67079 49.86987, ...
1               1  302258  POLYGON ((9.67137 54.75650, 9.67137 54.75874, ...
2               2  302548  POLYGON ((9.66808 48.21535, 9.66808 48.21760, ...

I want to match if a point from gdf_point is contained by any of the polygons of gdf_poly, if yes i want the Id of that polygon to be added to the corresponding row of gdf_point.

Here is my current code:

COUNTER = 0

def f(x, gdf_poly, df_new_point):
    global COUNTER

    for row in gdf_poly.itertuples():
        geom = getattr(row, 'geometry')
        id = getattr(row, 'Id')
        if geom.contains(x):
            print('True')
            df_new_point.loc[COUNTER, 'Id'] = id

    COUNTER = COUNTER + 1

df_new_point = gdf_point
gdf_point['geometry'].apply(lambda x: f(x, gdf_poly, df_new_point))

This works and does what i want it to do. But the Problem is its way to slow, it takes about 50min to do 10k rows (multithreading is a future option), and i want it to be able to handle multiple million rows. There must be a better and faster way to do this. Thanks for your help.

Asked By: PeterTschmoik

||

Answers:

To merge two dataframes on their geometries (not on column or index values), use one of geopandas’s spatial joins. They have a whole section of the docs about it – it’s great – give it a read!

There are two workhorse spatial join functions in geopandas:

  • GeoDataFrame.sjoin joins two dataframes based on a binary predicate performed on all combinations of geometries, one of intersects, contains, within, touches, crosses, or overlaps. You can specify whether you want a left, right, or inner join based on the how keyword argument

  • GeoDataFrame.sjoin_nearest joins two dataframes based on which geometry in one dataframe is closest to each element in the other. Similarly, the how argument gives left, right, and inner options. Additionally, there are two arguments to sjoin_nearest not available on sjoin:

    • max_distance: The max_distance argument specifies a maximum search radius for matching geometries. This can have a considerable performance impact in some cases. If you can, it is highly recommended that you use this parameter.

    • distance_col: If set, the resultant GeoDataFrame will include a column with this name containing the computed distances between an input geometry and the nearest geometry.

You can optionally use these global geopandas.sjoin and geopandas.sjoin_nearest functions, or use the methods geopandas.GeoDataFrame.sjoin and geopandas.GeoDataFrame.sjoin_nearest. Note, however, that the docs include a warning that the root-level functions may be deprecated at some point in the future, and recommend the use of the GeoDataFrame methods.

So in your case:

merged = gdf_poly.sjoin(gdf_point, predicate="contains")

will do the trick, though if you want to match polygons where the point falls exactly on the boundary, you may want to consider predicate="intersects".

Answered By: Michael Delgado
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.