In pandas, how do I get multiple slices of a MultiIndexed-dataframe at a time?
Question:
In pandas, I’m familiar with how to slice a Multi-Index with a list to get multiple values, like such:
(Pdb) df = pd.DataFrame({"A": range(0,10), "B": -1, "C": range(20,30), "D": range(30,40), "E":range(40,50)}).set_index(["A", "B", "C"])
(Pdb) df
D E
A B C
0 -1 20 30 40
1 -1 21 31 41
2 -1 22 32 42
3 -1 23 33 43
4 -1 24 34 44
5 -1 25 35 45
6 -1 26 36 46
7 -1 27 37 47
8 -1 28 38 48
9 -1 29 39 49
(Pdb) df.loc[ [0,1,2]]
D E
A B C
0 -1 20 30 40
1 -1 21 31 41
2 -1 22 32 42
But how can I do this for multiple levels at a time?
(Pdb) df.loc[ [0,1,2], -1]
*** KeyError: -1
Or ideally:
(Pdb) df.loc[ [0,1,2], [-1]]
*** KeyError: "None of [Int64Index([-1], dtype='int64')] are in the [columns]"
Answers:
You have to use tuple to slice your dataframe and you have to specify both index and columns indexes to allow Pandas to slice correctly your dataframes:
# A B v-- all columns
>>> df.loc[([0, 1, 2], -1), :]
D E
A B C
0 -1 20 30 40
1 -1 21 31 41
2 -1 22 32 42
# A all B C v-- all columns
>>> df.loc[([0, 1, 2], slice(None), [20, 22, 24]), :]
D E
A B C
0 -1 20 30 40
2 -1 22 32 42
More information: MultiIndex / advanced indexing
In pandas, I’m familiar with how to slice a Multi-Index with a list to get multiple values, like such:
(Pdb) df = pd.DataFrame({"A": range(0,10), "B": -1, "C": range(20,30), "D": range(30,40), "E":range(40,50)}).set_index(["A", "B", "C"])
(Pdb) df
D E
A B C
0 -1 20 30 40
1 -1 21 31 41
2 -1 22 32 42
3 -1 23 33 43
4 -1 24 34 44
5 -1 25 35 45
6 -1 26 36 46
7 -1 27 37 47
8 -1 28 38 48
9 -1 29 39 49
(Pdb) df.loc[ [0,1,2]]
D E
A B C
0 -1 20 30 40
1 -1 21 31 41
2 -1 22 32 42
But how can I do this for multiple levels at a time?
(Pdb) df.loc[ [0,1,2], -1]
*** KeyError: -1
Or ideally:
(Pdb) df.loc[ [0,1,2], [-1]]
*** KeyError: "None of [Int64Index([-1], dtype='int64')] are in the [columns]"
You have to use tuple to slice your dataframe and you have to specify both index and columns indexes to allow Pandas to slice correctly your dataframes:
# A B v-- all columns
>>> df.loc[([0, 1, 2], -1), :]
D E
A B C
0 -1 20 30 40
1 -1 21 31 41
2 -1 22 32 42
# A all B C v-- all columns
>>> df.loc[([0, 1, 2], slice(None), [20, 22, 24]), :]
D E
A B C
0 -1 20 30 40
2 -1 22 32 42
More information: MultiIndex / advanced indexing