trying to merge 2 GeoDataFrames with sjoin and ran for infinity
Question:
I have 2 geopandas dataframe
gdfTraffic_df – contains polygon in geometry column
num of rows = 1,916,560
gdfAlerts_df – contains point in geometry column
num of rows = 632,259
im trying to merge the 2 geodataframe to a new df that give me the rows only with the polygon that contains point
i execute this code
merged = gdfTraffic_df.sjoin(gdf, predicate="contains")
unfortunately it looks like its never stop runing
merged = gdfTraffic_df.sjoin(gdf, predicate="contains")
expected to get new df that contains df with polygon and the points he contains
Answers:
Make sure you have either Pygeos version >= 0.8 installed or you have shapely version >= 2.0. These are optional dependencies which dramatically improve performance for large operations like this.
# conda
conda install pygeos --channel conda-forge
# pip
pip install pygeos
If you are thinking about installing them, I’d give the optional dependencies installation guide a read to make sure this advice is still up to date, and to check the list of known downsides, as there is at least one thing that won’t work with the pygeos speed up (CRS transforms for 3D objects).
If you have both of these, you can check that pygeos is enabled with:
geopandas.options.use_pygeos # should be True
If it is and it’s still taking a long time, then I’d go make some cookies or something. It’s a large operation you have there.
i solved it for anyone who will meet it in future
i devied it to weeks and did the action on small group of dates and it worked !
thanx
I have 2 geopandas dataframe
gdfTraffic_df – contains polygon in geometry column
num of rows = 1,916,560
gdfAlerts_df – contains point in geometry column
num of rows = 632,259
im trying to merge the 2 geodataframe to a new df that give me the rows only with the polygon that contains point
i execute this code
merged = gdfTraffic_df.sjoin(gdf, predicate="contains")
unfortunately it looks like its never stop runing
merged = gdfTraffic_df.sjoin(gdf, predicate="contains")
expected to get new df that contains df with polygon and the points he contains
Make sure you have either Pygeos version >= 0.8 installed or you have shapely version >= 2.0. These are optional dependencies which dramatically improve performance for large operations like this.
# conda
conda install pygeos --channel conda-forge
# pip
pip install pygeos
If you are thinking about installing them, I’d give the optional dependencies installation guide a read to make sure this advice is still up to date, and to check the list of known downsides, as there is at least one thing that won’t work with the pygeos speed up (CRS transforms for 3D objects).
If you have both of these, you can check that pygeos is enabled with:
geopandas.options.use_pygeos # should be True
If it is and it’s still taking a long time, then I’d go make some cookies or something. It’s a large operation you have there.
i solved it for anyone who will meet it in future
i devied it to weeks and did the action on small group of dates and it worked !
thanx