trying to merge 2 GeoDataFrames with sjoin and ran for infinity

Question:

I have 2 geopandas dataframe

gdfTraffic_df – contains polygon in geometry column
num of rows = 1,916,560

enter image description here

gdfAlerts_df – contains point in geometry column
num of rows = 632,259

enter image description here

im trying to merge the 2 geodataframe to a new df that give me the rows only with the polygon that contains point
i execute this code

merged = gdfTraffic_df.sjoin(gdf, predicate="contains")

unfortunately it looks like its never stop runing

merged = gdfTraffic_df.sjoin(gdf, predicate="contains")

expected to get new df that contains df with polygon and the points he contains

Asked By: Nati Elkayam

||

Answers:

Make sure you have either Pygeos version >= 0.8 installed or you have shapely version >= 2.0. These are optional dependencies which dramatically improve performance for large operations like this.

# conda
conda install pygeos --channel conda-forge
# pip
pip install pygeos

If you are thinking about installing them, I’d give the optional dependencies installation guide a read to make sure this advice is still up to date, and to check the list of known downsides, as there is at least one thing that won’t work with the pygeos speed up (CRS transforms for 3D objects).

If you have both of these, you can check that pygeos is enabled with:

geopandas.options.use_pygeos  # should be True

If it is and it’s still taking a long time, then I’d go make some cookies or something. It’s a large operation you have there.

Answered By: Michael Delgado

i solved it for anyone who will meet it in future
i devied it to weeks and did the action on small group of dates and it worked !
thanx

Answered By: Nati Elkayam
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.