Creating a dataframe from dbscan clustering results

Question:

I have performed a clustering with geospatial data with the dbscan algorithm. You can see the project and the code in more detail here: https://notebook.community/gboeing/urban-data-science/15-Spatial-Cluster-Analysis/cluster-analysis

I would like to calculate the following in a dataframe:

  • the area of each cluster. It can be calculated as: (lat_max – lat_min) * (lon_max – lon_min)

  • number of points belonging to each cluster

At the moment I have added to the original dataset a column with the cluster to which the coordinate belongs.

for n in range(num_clusters):
    df['cluster'] = pd.Series(cluster_labels, index=df.index)

Any idea of simple code that would allow me to do this?

Answers:

The code is something like

import pandas as pd

df = pd.DataFrame({
    'cluster': [0, 1, 2],
    'pts': [5, 6, 10],
    'lat': [45, 47, 45],
    'lon': [24, 23, 20],
})

df = df.groupby('cluster').agg(
    min_lat=('lat', 'min'),
    max_lat=('lat', 'max'),
    min_lon=('lon', 'min'),
    max_lon=('lon', 'max'),
)

df["area"] = (df["max_lat"] - df["min_lat"]) * (df["max_lon"] - df["min_lon"])
Answered By: Dimitrius
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.