Summing over a multiindex level in a pandas series
Question:
I would like to sum (marginalize) over one level in a series with a 3-level multiindex to produce a series with a 2 level multiindex. For example, if I have the following:
ind = [tuple(x) for x in ['ABC', 'ABc', 'AbC', 'Abc', 'aBC', 'aBc', 'abC', 'abc']]
mi = pd.MultiIndex.from_tuples(ind)
data = pd.Series([264, 13, 29, 8, 152, 7, 15, 1], index=mi)
A B C 264
c 13
b C 29
c 8
a B C 152
c 7
b C 15
c 1
I would like to sum over the C
variables to produce the following output:
A B 277
b 37
a B 159
b 16
What is the best way in Pandas to do this?
Answers:
If you know you always want to aggregate over the first two levels, then this is pretty easy:
In [27]: data.groupby(level=[0, 1]).sum()
Out[27]:
A B 277
b 37
a B 159
b 16
dtype: int64
Another possibility is to unstack
the Series into a dataframe and sum
horizontally.
data.unstack().sum(axis=1)
A B 277
b 37
a B 159
b 16
dtype: int64
The level to unstack
on must be the level(s) who values are to be summed up. So for example, the following two are equivalent.
x = data.unstack(level=0).sum(axis=1)
y = data.groupby(level=[1,2]).sum()
x.equals(y) # True
I would like to sum (marginalize) over one level in a series with a 3-level multiindex to produce a series with a 2 level multiindex. For example, if I have the following:
ind = [tuple(x) for x in ['ABC', 'ABc', 'AbC', 'Abc', 'aBC', 'aBc', 'abC', 'abc']]
mi = pd.MultiIndex.from_tuples(ind)
data = pd.Series([264, 13, 29, 8, 152, 7, 15, 1], index=mi)
A B C 264
c 13
b C 29
c 8
a B C 152
c 7
b C 15
c 1
I would like to sum over the C
variables to produce the following output:
A B 277
b 37
a B 159
b 16
What is the best way in Pandas to do this?
If you know you always want to aggregate over the first two levels, then this is pretty easy:
In [27]: data.groupby(level=[0, 1]).sum()
Out[27]:
A B 277
b 37
a B 159
b 16
dtype: int64
Another possibility is to unstack
the Series into a dataframe and sum
horizontally.
data.unstack().sum(axis=1)
A B 277
b 37
a B 159
b 16
dtype: int64
The level to unstack
on must be the level(s) who values are to be summed up. So for example, the following two are equivalent.
x = data.unstack(level=0).sum(axis=1)
y = data.groupby(level=[1,2]).sum()
x.equals(y) # True