GeoPandas: Apply function to row multiple time

Question

I have a GeoDataFrame with the following columns. The column node_location is a dictionary of OSM node IDs with the corresponding coordinates.

{
    "geometry": LineString(LINESTRING (8.6320625 49.3500941, 8.632062 49.3501782)),
    "node_locations": {75539413: {"lat": 52.5749342, "lon": 13.3008981},
                       75539407: {"lat": 52.5746156, "lon": 13.3029441},
                       75539412: {"lat": 52.5751579, "lon": 13.3012622}
    ...
}

My goal is to split all intersecting lines, but only if the intersection point exists in the node_locations columns. E.g. in the picture only the lines with the green dot should be splitted, because the green dot is a point in the node_locations. The red one does not appear there, so it should not be splitted.

As I have a lot of data, I want to use apply() to make it more performant. I created a function split_intersecting_ways that iterates over each row and and determines all intersecting geometries. Then I use another apply that calls split_intersecting_geometries on all these intersecting rows and pass my row from the first apply function as argument to split the geometry. This new split geometry should be used in the next iteration. As I can have multiple intersecting geometries where I should split, it should split the original geometry iterative and use the previus splitted GeometryCollection as input for the new iteration.

def split_intersecting_geometries(intersecting, row):
    if intersecting.name != row.name and intersecting.geometry.type != 'GeometryCollection':
        intersection = row.geometry.intersection(intersecting.geometry)
        if intersection.type == 'Point':
            lon, lat = intersection.coords.xy
            for key, value in row.node_locations.items():
                if lat[0] == value["lat"] and lon[0] == value["lon"]:
                    return split(row.geometry, intersecting.geometry) # Creates a GeometryCollection with splitted lines
    return row.geometry

def split_intersecting_ways(row, data):
    intersecting_rows = data[data.geometry.intersects(row.geometry)]
    data['geometry'] = intersecting_rows.apply(split_intersecting_geometries, args=(row,), axis=1)
    return data['geometry']

edges['geometry'] = edges.apply(split_intersecting_ways, args=(edges,), axis=1)

After some iterations I get the error Columns must be same length as key. How can I fix this?

Asked By: Kewitschka

||

Source

Answer 1

The error Columns must be same length as key likely means that the length of the new ‘geometry’ column is not the same as the length of the DataFrame, this can be caused by the fact that you are returning the same ‘geometry’ column on the cases where the intersection point doesn’t exist in the ‘node_locations’ column.

You can fix this by keeping track of the rows that have been modified, and then reassigning only those rows to the ‘geometry’ column. You should try this instead:

def split_intersecting_geometries(intersecting, row):
    if intersecting.name != row.name and intersecting.geometry.type != 'GeometryCollection':
        intersection = row.geometry.intersection(intersecting.geometry)
        if intersection.type == 'Point':
            lon, lat = intersection.coords.xy
            for key, value in row.node_locations.items():
                if lat[0] == value["lat"] and lon[0] == value["lon"]:
                    return split(row.geometry, intersecting.geometry) # Creates a GeometryCollection with splitted lines
    return None

def split_intersecting_ways(row, data):
    modified_indexes = []
    intersecting_rows = data[data.geometry.intersects(row.geometry)]
    for idx, intersecting in intersecting_rows.iterrows():
        new_geometry = split_intersecting_geometries(intersecting, row)
        if new_geometry is not None:
            data.at[idx, 'geometry'] = new_geometry
            modified_indexes.append(idx)
    return modified_indexes

modified_indexes = edges.apply(split_intersecting_ways, args=(edges,), axis=1)
# flatten the list of modified indexes
modified_indexes = [index for sublist in modified_indexes for index in sublist]
# reassign only the modified rows to the 'geometry' column
edges.loc[modified_indexes, 'geometry'] = edges.loc[modified_indexes, 'geometry']

Also, you should consider increasing the performance by using Dask Dataframe instead of Pandas Dataframe:

import dask.dataframe as dd

# convert your pandas dataframe to dask dataframe
edges = dd.from_pandas(edges, npartitions=8)

def split_intersecting_geometries(intersecting, row):
    if intersecting.name != row.name and intersecting.geometry.type != 'GeometryCollection':
        intersection = row.geometry.intersection(intersecting.geometry)
        if intersection.type == 'Point':
            lon, lat = intersection.coords.xy
            for key, value in row.node_locations.items():
                if lat[0] == value["lat"] and lon[0] == value["lon"]:
                    return split(row.geometry, intersecting.geometry) # Creates a GeometryCollection with splitted lines
    return None

def split_intersecting_ways(row, data):
    modified_indexes = []
    intersecting_rows = data[data.geometry.intersects(row.geometry)]
    for idx, intersecting in intersecting_rows.iterrows():
        new_geometry = split_intersecting_geometries(intersecting, row)
        if new_geometry is not None:
            data.at[idx, 'geometry'] = new_geometry
            modified_indexes.append(idx)
    return modified_indexes

modified_indexes = edges.apply(split_intersecting_ways, args=(edges,), axis=1, meta=('geometry', 'f8')).compute()
# flatten the list of modified indexes
modified_indexes = [index for sublist in modified_indexes for index in sublist]
# reassign only the modified rows to the 'geometry' column
edges.loc[modified_indexes, 'geometry'] = edges.loc[modified_indexes, 'geometry'].compute()

Answered By: Serge de Gosson de Varennes

GeoPandas: Apply function to row multiple time

Question:

Answers: