How would you use Dask to recursively find neighbouring polygons in a Dask.Geodataframe?

Question:

I am new to Dask.

I’ve been trying to get it to do the following task:

I have two geodataframes and a set:

# Main chunk and combined chunk are a list of polygons of tessellated cells

main_chunk = gpd.read_parquet(f"./out/singapore/tess_chunk_{int(n1)}.pq")
combined_chunks = main_chunk + adjacent chunks

# This is a set of uids in the main chunk
main_chunk_ids = set(main_chunk['uID'])

I have been trying to expand the main chunk via queen contiguity to order of 3 in two stages. Firstly, iterate through the uID of all cells in the main chunk, finding all neighbouring cells in the expanded chunk, adding the uID if it is not already in main_chunk_ids, than running this function recursively on each neighbour until order 3 is attained

This is the non-Dask version that works:

# def neigh_look(cell, main_chunk_ids, order):
#     neighbours = combined_chunks[~combined_chunks.geometry.disjoint(cell.geometry)]
#     for index, neighbour in neighbours.iterrows():
        
#         if not neighbour["uID"] in main_chunk_ids:
#             main_chunk_ids.add(neighbour["uID"])

#              if order < 3:
#                 main_chunk_ids.union(neigh_look(neighbour, main_chunk_ids, order+1))
    
#     return main_chunk_ids

I’ve been trying to dask-ify this code, but flailing; this is what I have so far, which crashed python:

    %%time

queen_out = {}

def neigh_look(cell, main_chunk_ids, order):
    neighbours = combined_chunks_dask[~combined_chunks_dask.geometry.disjoint(cell.geometry)]
    for index, neighbour in neighbours.iterrows():
        
        if not neighbour["uID"] in main_chunk_ids:
            main_chunk_ids.add(neighbour["uID"])

        if order < 3:
           main_chunk_ids.union(neigh_look(neighbour, main_chunk_ids, order+1))
    
    gc.collect()

    return main_chunk_ids

for n1 in tqdm(range(1), total=1):
    main_chunk = gpd.read_parquet(f"./out/singapore/tess_chunk_{int(n1)}.pq")
    combined_chunks = main_chunk

    main_chunk_ids = set(main_chunk['uID'])
    queen_cells = main_chunk_ids

    for n2 in w.neighbors[n1]:
        neigh_chunk = gpd.read_parquet(f"./out/singapore/tess_chunk_{int(n2)}.pq")
        combined_chunks = combined_chunks.append(neigh_chunk)

    combined_chunks_dask = dgpd.from_geopandas(combined_chunks, npartitions=16)

    queen_area_delayed = []
    for index, row in main_chunk.iterrows():
        queen_area_delayed.append(delayed(neigh_look)(row, main_chunk_ids, 0))

        if index % 1000 == 0:
            gc.collect() # trigger garbage collection

    queen_area = dask.compute(*queen_area_delayed)
    queen_out[n1] = queen_area

Any help will be appreaciated!

Asked By: Reuben

||

Answers:

I fixed it by abandoning using recursion altogether –

EDIT:

I just made Dask iterate through each polygon at a time instead of dealing with recursion.

Answered By: Reuben