Group a multi-indexed pandas dataframe by one of its levels?

Question:

Is it possible to groupby a multi-index (2 levels) pandas dataframe by one of the multi-index levels ?

The only way I know of doing it is to reset_index on a multiindex and then set index again. I am sure there is a better way to do it, and I want to know how.

Asked By: silencer

||

Answers:

Yes, use the level parameter. Take a look here. Example:

In [26]: s

first  second  third
bar    doo     one      0.404705
               two      0.577046
baz    bee     one     -1.715002
               two     -1.039268
foo    bop     one     -0.370647
               two     -1.157892
qux    bop     one     -1.344312
               two      0.844885
dtype: float64

In [27]: s.groupby(level=['first','second']).sum()

first  second
bar    doo       0.981751
baz    bee      -2.754270
foo    bop      -1.528539
qux    bop      -0.499427
dtype: float64
Answered By: elyase

If there are already multiple index available, then simply position number can be used instead of column name:

df = df.groupby(level=[0,1]).size()
Answered By: AtanuCSE

In recent versions of pandas, you can group by multi-index level names similar to columns (i.e. without the level keyword), allowing you to use both simultaneously.

>>> import pandas as pd
>>> pd.__version__
'1.0.5'
>>> df = pd.DataFrame({
...     'first': ['a', 'a', 'a', 'b', 'b', 'b'],
...     'second': ['x', 'y', 'x', 'z', 'y', 'z'],
...     'column': ['k', 'k', 'l', 'l', 'm', 'n'],
...     'data': [0, 1, 2, 3, 4, 5],
... }).set_index(['first', 'second'])
>>> df.groupby('first').sum()
       data
first      
a         3
b        12
>>> df.groupby(['second', 'column']).sum()
               data
second column      
x      k          0
       l          2
y      k          1
       m          4
z      l          3
       n          5

The column and index level names you groupby must be unique. If you have a column and index level with the same name, you will get a ValueError when trying to groupby.

Answered By: HoosierDaddy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.