Sum column data while merging zip_code polygons to MultiPolygons in geopandas

Question:

I m working with python on a Jupyter notebook
I have the following dataset:

    +-------+------------+----------+---------------------------------------------------+
    |  zip  | population |   area#  |                     polygon                       |
    +-------+------------+----------+---------------------------------------------------+
    | 12345 | 50         | 55       | POLYGON ((-55.66788 40.04416, -55.66790 40.044... |
    | 12346 | 100        | 55       | POLYGON ((-55.54666 40.40131, -55.54678 40.400... |
    | .     | .          | .        | .                                                 |
    | .     | .          | .        | .                                                 |
    | 98765 | 236667     | 155      | POLYGON ((-155.04682 78.53585, -155.04680 78.5..  |
    +-------+--------+--------------+---------------------------------------------------+

Where the polygon column is a geopandas.GeoSeries and each geometry element is a shapely.geometry.polygon.Polygon.

I transformed the dataset into a geodataframe:

from geopandas import GeoDataFrame
dataset = GeoDataFrame(dataset)

And used the set_geometry function to assign the geometry column:

dataset = dataset.set_geometry("polygon")

Everything seems to be working fine and I am able to plot heatmaps using this GeoDataFrame.

The issue I am having is that I am trying to create a dataset grouping the population per area, but I also have to group the polygons, which I have been failing to do so.

the final dataset should look like this, with all the zip polygons with the same area# should be collapsed into a single row with a MultiPolygon geometry and the total of the population values:


    +------------+----------+--------------------------------------------------------+
    | population |   area#  |                        polygon                         |
    +------------+----------+--------------------------------------------------------+
    | 150        | 55       | MULTYPOLYGON ((-55.66788 40.04416, -55.66790 40.044... |
    | .          | .        | .                                                      |
    | .          | .        | .                                                      |
    | .          | .        | .                                                      |
    | 236667     | 155      | MULTYPOLYGON ((-155.04682 78.53585, -155.04680 78.5..  |
    +------------+----------+--------------------------------------------------------+

I really don’t need to follow the steps I outlined before, these are the steps I found here on Stack Overflow. I am ok doing something else from scratch.

Asked By: artodito

||

Answers:

The geopandas spatial equivalent of a pandas .groupby().aggreagte() operation is dissolve. Take a look through the docs, they’re really helpful.

One key argument to note is the aggfunc argument. From the docs:

The aggfunc = argument defaults to ‘first’ which means that the first row of attributes values found in the dissolve routine will be assigned to the resultant dissolved geodataframe. However it also accepts other summary statistic options as allowed by pandas.groupby including:

  • ‘first’
  • ‘last’
  • ‘min’
  • ‘max’
  • ‘sum’
  • ‘mean’
  • ‘median’
    function
    string function name
    list of functions and/or function names, e.g. [np.sum, ‘mean’]
    dict of axis labels -> functions, function names or list of such.

If you’re looking to group on area, and sum the populations within each area, as well as unify the polygons, you can use aggfunc={"population": "sum"}, e.g.:

aggregated = dataset.dissolve("area#", aggfunc={"population": "sum"})
Answered By: Michael Delgado
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.