pandas: slice a MultiIndex by range of secondary index
Question:
I have a series with a MultiIndex like this:
import numpy as np
import pandas as pd
buckets = np.repeat(['a','b','c'], [3,5,1])
sequence = [0,1,5,0,1,2,4,50,0]
s = pd.Series(
np.random.randn(len(sequence)),
index=pd.MultiIndex.from_tuples(zip(buckets, sequence))
)
# In [6]: s
# Out[6]:
# a 0 -1.106047
# 1 1.665214
# 5 0.279190
# b 0 0.326364
# 1 0.900439
# 2 -0.653940
# 4 0.082270
# 50 -0.255482
# c 0 -0.091730
I’d like to get the s[‘b’] values where the second index (‘sequence
‘) is between 2 and 10.
Slicing on the first index works fine:
s['a':'b']
# Out[109]:
# bucket value
# a 0 1.828176
# 1 0.160496
# 5 0.401985
# b 0 -1.514268
# 1 -0.973915
# 2 1.285553
# 4 -0.194625
# 5 -0.144112
But not on the second, at least by what seems to be the two most obvious ways:
1) This returns elements 1 through 4, with nothing to do with the index values
s['b'][1:10]
# In [61]: s['b'][1:10]
# Out[61]:
# 1 0.900439
# 2 -0.653940
# 4 0.082270
# 50 -0.255482
However, if I reverse the index and the first index is integer and the second index is a string, it works:
In [26]: s
Out[26]:
0 a -0.126299
1 a 1.810928
5 a 0.571873
0 b -0.116108
1 b -0.712184
2 b -1.771264
4 b 0.148961
50 b 0.089683
0 c -0.582578
In [25]: s[0]['a':'b']
Out[25]:
a -0.126299
b -0.116108
Answers:
not sure if this is ideal but it works by creating a mask
In [59]: s.index
Out[59]:
MultiIndex
[('a', 0) ('a', 1) ('a', 5) ('b', 0) ('b', 1) ('b', 2) ('b', 4)
('b', 50) ('c', 0)]
In [77]: s[(tpl for tpl in s.index if 2<=tpl[1]<=10 and tpl[0]=='b')]
Out[77]:
b 2 -0.586568
4 1.559988
EDIT : hayden’s solution is the way to go
As Robbie-Clarken answers, since 0.14 you can pass a slice in the tuple you pass to loc:
In [11]: s.loc[('b', slice(2, 10))]
Out[11]:
b 2 -0.65394
4 0.08227
dtype: float64
Indeed, you can pass a slice for each level:
In [12]: s.loc[(slice('a', 'b'), slice(2, 10))]
Out[12]:
a 5 0.27919
b 2 -0.65394
4 0.08227
dtype: float64
Note: the slice is inclusive.
Old answer:
You can also do this using:
s.ix[1:10, "b"]
(It’s good practice to do in a single ix/loc/iloc since this version allows assignment.)
This answer was written prior to the introduction of iloc in early 2013, i.e. position/integer location – which may be preferred in this case. The reason it was created was to remove the ambiguity from integer-indexed pandas objects, and be more descriptive: “I’m slicing on position”.
s["b"].iloc[1:10]
That said, I kinda disagree with the docs that ix is:
most robust and consistent way
it’s not, the most consistent way is to describe what you’re doing:
- use loc for labels
- use iloc for position
- use ix for both (if you really have to)
Remember the zen of python:
explicit is better than implicit
The best way I can think of is to use ‘select’ in this case. Although it even says in the docs that “This method should be used only when there is no more direct way.”
In [116]: s
Out[116]:
a 0 1.724372
1 0.305923
5 1.780811
b 0 -0.556650
1 0.207783
4 -0.177901
50 0.289365
0 1.168115
In [117]: s.select(lambda x: x[0] == 'b' and 2 <= x[1] <= 10)
Out[117]: b 4 -0.177901
As of pandas 0.14.0 it is possible to slice multi-indexed objects by providing .loc
a tuple containing slice objects:
In [2]: s.loc[('b', slice(2, 10))]
Out[2]:
b 2 -1.206052
4 -0.735682
dtype: float64
Since pandas 0.15.0 this works:
s.loc['b', 2:10]
Output:
b 2 -0.503023
4 0.704880
dtype: float64
With a DataFrame
it’s slightly different (source):
df.loc(axis=0)['b', 2:10]
I have a series with a MultiIndex like this:
import numpy as np
import pandas as pd
buckets = np.repeat(['a','b','c'], [3,5,1])
sequence = [0,1,5,0,1,2,4,50,0]
s = pd.Series(
np.random.randn(len(sequence)),
index=pd.MultiIndex.from_tuples(zip(buckets, sequence))
)
# In [6]: s
# Out[6]:
# a 0 -1.106047
# 1 1.665214
# 5 0.279190
# b 0 0.326364
# 1 0.900439
# 2 -0.653940
# 4 0.082270
# 50 -0.255482
# c 0 -0.091730
I’d like to get the s[‘b’] values where the second index (‘sequence
‘) is between 2 and 10.
Slicing on the first index works fine:
s['a':'b']
# Out[109]:
# bucket value
# a 0 1.828176
# 1 0.160496
# 5 0.401985
# b 0 -1.514268
# 1 -0.973915
# 2 1.285553
# 4 -0.194625
# 5 -0.144112
But not on the second, at least by what seems to be the two most obvious ways:
1) This returns elements 1 through 4, with nothing to do with the index values
s['b'][1:10]
# In [61]: s['b'][1:10]
# Out[61]:
# 1 0.900439
# 2 -0.653940
# 4 0.082270
# 50 -0.255482
However, if I reverse the index and the first index is integer and the second index is a string, it works:
In [26]: s
Out[26]:
0 a -0.126299
1 a 1.810928
5 a 0.571873
0 b -0.116108
1 b -0.712184
2 b -1.771264
4 b 0.148961
50 b 0.089683
0 c -0.582578
In [25]: s[0]['a':'b']
Out[25]:
a -0.126299
b -0.116108
not sure if this is ideal but it works by creating a mask
In [59]: s.index
Out[59]:
MultiIndex
[('a', 0) ('a', 1) ('a', 5) ('b', 0) ('b', 1) ('b', 2) ('b', 4)
('b', 50) ('c', 0)]
In [77]: s[(tpl for tpl in s.index if 2<=tpl[1]<=10 and tpl[0]=='b')]
Out[77]:
b 2 -0.586568
4 1.559988
EDIT : hayden’s solution is the way to go
As Robbie-Clarken answers, since 0.14 you can pass a slice in the tuple you pass to loc:
In [11]: s.loc[('b', slice(2, 10))]
Out[11]:
b 2 -0.65394
4 0.08227
dtype: float64
Indeed, you can pass a slice for each level:
In [12]: s.loc[(slice('a', 'b'), slice(2, 10))]
Out[12]:
a 5 0.27919
b 2 -0.65394
4 0.08227
dtype: float64
Note: the slice is inclusive.
Old answer:
You can also do this using:
s.ix[1:10, "b"]
(It’s good practice to do in a single ix/loc/iloc since this version allows assignment.)
This answer was written prior to the introduction of iloc in early 2013, i.e. position/integer location – which may be preferred in this case. The reason it was created was to remove the ambiguity from integer-indexed pandas objects, and be more descriptive: “I’m slicing on position”.
s["b"].iloc[1:10]
That said, I kinda disagree with the docs that ix is:
most robust and consistent way
it’s not, the most consistent way is to describe what you’re doing:
- use loc for labels
- use iloc for position
- use ix for both (if you really have to)
Remember the zen of python:
explicit is better than implicit
The best way I can think of is to use ‘select’ in this case. Although it even says in the docs that “This method should be used only when there is no more direct way.”
In [116]: s
Out[116]:
a 0 1.724372
1 0.305923
5 1.780811
b 0 -0.556650
1 0.207783
4 -0.177901
50 0.289365
0 1.168115
In [117]: s.select(lambda x: x[0] == 'b' and 2 <= x[1] <= 10)
Out[117]: b 4 -0.177901
As of pandas 0.14.0 it is possible to slice multi-indexed objects by providing .loc
a tuple containing slice objects:
In [2]: s.loc[('b', slice(2, 10))]
Out[2]:
b 2 -1.206052
4 -0.735682
dtype: float64
Since pandas 0.15.0 this works:
s.loc['b', 2:10]
Output:
b 2 -0.503023
4 0.704880
dtype: float64
With a DataFrame
it’s slightly different (source):
df.loc(axis=0)['b', 2:10]