Using .loc with a MultiIndex in pandas
Question:
Does anyone know if it is possible to use the DataFrame.loc
method to select from a MultiIndex
? I have the following DataFrame
and would like to be able to access the values located in the Dwell
columns, at the indices of ('at', 1)
, ('at', 3)
, ('at', 5)
, and so on (non-sequential).
I’d love to be able to do something like data.loc[['at',[1,3,5]], 'Dwell']
, similar to the data.loc[[1,3,5], 'Dwell']
syntax for a regular index (which returns a 3-member series of Dwell
values).
My purpose is to select an arbitrary subset of the data, perform some analysis only on that subset, and then update the new values with the results of the analysis. I plan on using the same syntax to set new values for these data, so chaining selectors wouldn’t really work in this case.
Here is a slice of the DataFrame
I’m working with:
Char Dwell Flight ND_Offset Offset
QGram
at 0 a 100 120 0.000000 0
1 t 180 0 0.108363 5
2 a 100 120 0.000000 0
3 t 180 0 0.108363 5
4 a 20 180 0.000000 0
5 t 80 120 0.108363 5
6 a 20 180 0.000000 0
7 t 80 120 0.108363 5
8 a 20 180 0.000000 0
9 t 80 120 0.108363 5
10 a 120 180 0.000000 0
Answers:
Try the cross-section indexing:
In [68]: df.xs('at', level='QGram', drop_level=False).loc[[1,4]]
Out[68]:
Char Dwell Flight ND_Offset Offset
QGram
at 1 t 180 0 0.108363 5
4 a 20 180 0.000000 0
If you are on version 0.14, you can simply pass a tuple to .loc
as below:
df.loc[('at', [1,3,4]), 'Dwell']
In general, MultiIndex keys take the form of tuples. For example:
In [6]: df.loc[('at', 1),'Dwell']
Out[6]: 180
In your case, you would have to pass a list of tuples. For example, the following works as you would expect:
In [7]: df.loc[ [('at', 1),('at', 3),('at', 5)], 'Dwell']
Out[7]:
Dwell
QGram
at 1 180
at 3 180
at 5 80
.loc is your best friend with multi-index. However, you must understand how loc works on multi indexes. When using loc on multi indexes you must specify every other index value in the loc such as:
df.loc['indexValue1','indexValue2','indexValue3']
However, as you may imagine this may be a pain in cases you don’t know what all the other values are so we can of course use ‘:’
df.loc[:,'value1','value2',:]
Hope this helps!
I have had the same problem.
df[(‘colindex1′,’colindex2’)][(‘rowindex1′,’rowindex2’, ‘rowindex3’)] : no problem
df.loc[(‘rowindex1′,’rowindex2’, ‘rowindex3’)],[(‘colindex1′,’colindex2’)] : gives the following error message :
KeyError: "None of [Index([‘rowindex1′,’rowindex2’, n ‘rowindex3′], n dtype=’object’)] are in the [index]"
I have tried to put my index tupke inside a list and the result was OK:
df.loc[[('rowindex1','rowindex2', 'rowindex3')],[('colindex1','colindex2')]]
I don’t know why, maybe because some unshown "n" are added in the index?
Does anyone know if it is possible to use the DataFrame.loc
method to select from a MultiIndex
? I have the following DataFrame
and would like to be able to access the values located in the Dwell
columns, at the indices of ('at', 1)
, ('at', 3)
, ('at', 5)
, and so on (non-sequential).
I’d love to be able to do something like data.loc[['at',[1,3,5]], 'Dwell']
, similar to the data.loc[[1,3,5], 'Dwell']
syntax for a regular index (which returns a 3-member series of Dwell
values).
My purpose is to select an arbitrary subset of the data, perform some analysis only on that subset, and then update the new values with the results of the analysis. I plan on using the same syntax to set new values for these data, so chaining selectors wouldn’t really work in this case.
Here is a slice of the DataFrame
I’m working with:
Char Dwell Flight ND_Offset Offset
QGram
at 0 a 100 120 0.000000 0
1 t 180 0 0.108363 5
2 a 100 120 0.000000 0
3 t 180 0 0.108363 5
4 a 20 180 0.000000 0
5 t 80 120 0.108363 5
6 a 20 180 0.000000 0
7 t 80 120 0.108363 5
8 a 20 180 0.000000 0
9 t 80 120 0.108363 5
10 a 120 180 0.000000 0
Try the cross-section indexing:
In [68]: df.xs('at', level='QGram', drop_level=False).loc[[1,4]]
Out[68]:
Char Dwell Flight ND_Offset Offset
QGram
at 1 t 180 0 0.108363 5
4 a 20 180 0.000000 0
If you are on version 0.14, you can simply pass a tuple to .loc
as below:
df.loc[('at', [1,3,4]), 'Dwell']
In general, MultiIndex keys take the form of tuples. For example:
In [6]: df.loc[('at', 1),'Dwell']
Out[6]: 180
In your case, you would have to pass a list of tuples. For example, the following works as you would expect:
In [7]: df.loc[ [('at', 1),('at', 3),('at', 5)], 'Dwell']
Out[7]:
Dwell
QGram
at 1 180
at 3 180
at 5 80
.loc is your best friend with multi-index. However, you must understand how loc works on multi indexes. When using loc on multi indexes you must specify every other index value in the loc such as:
df.loc['indexValue1','indexValue2','indexValue3']
However, as you may imagine this may be a pain in cases you don’t know what all the other values are so we can of course use ‘:’
df.loc[:,'value1','value2',:]
Hope this helps!
I have had the same problem.
df[(‘colindex1′,’colindex2’)][(‘rowindex1′,’rowindex2’, ‘rowindex3’)] : no problem
df.loc[(‘rowindex1′,’rowindex2’, ‘rowindex3’)],[(‘colindex1′,’colindex2’)] : gives the following error message :
KeyError: "None of [Index([‘rowindex1′,’rowindex2’, n ‘rowindex3′], n dtype=’object’)] are in the [index]"
I have tried to put my index tupke inside a list and the result was OK:
df.loc[[('rowindex1','rowindex2', 'rowindex3')],[('colindex1','colindex2')]]
I don’t know why, maybe because some unshown "n" are added in the index?