Select only one index of multiindex DataFrame
Question:
I am trying to create a new DataFrame using only one index from a multi-indexed DataFrame.
A B C
first second
bar one 0.895717 0.410835 -1.413681
two 0.805244 0.813850 1.607920
baz one -1.206412 0.132003 1.024180
two 2.565646 -0.827317 0.569605
foo one 1.431256 -0.076467 0.875906
two 1.340309 -1.187678 -2.211372
qux one -1.170299 1.130127 0.974466
two -0.226169 -1.436737 -2.006747
Ideally, I would like something like this:
In: df.ix[level="first"]
and:
Out:
A B C
first
bar 0.895717 0.410835 -1.413681
0.805244 0.813850 1.607920
baz -1.206412 0.132003 1.024180
2.565646 -0.827317 0.569605
foo 1.431256 -0.076467 0.875906
1.340309 -1.187678 -2.211372
qux -1.170299 1.130127 0.974466
-0.226169 -1.436737 -2.006747
`
Essentially I want to drop all the other indexes of the multi-index other than level first
. Is there an easy way to do this?
Answers:
One way could be to simply rebind df.index
to the desired level of the MultiIndex. You can do this by specifying the label name you want to keep:
df.index = df.index.get_level_values('first')
or use the level’s integer value:
df.index = df.index.get_level_values(0)
All other levels of the MultiIndex would disappear here.
The solution is fairly new and uses the df.xs
function as
In [88]: df.xs('bar', level='first')
Out[88]:
Second Third
one A -2.315312
B 0.497769
C 0.108523
two A -0.778303
B -1.555389
C -2.625022
dtype: float64
Can also do with multiple indices as
In [89]: df.xs(('bar', 'A'), level=('First', 'Third'))
Out[89]:
Second
one -2.315312
two -0.778303
dtype: float64
The setup for the examples is below
import pandas as pd
import numpy as np
arrays = [
np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
]
index = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
df.index.names = pd.core.indexes.frozen.FrozenList(['First', 'Second', 'Third'])
df = df.unstack()
I used the get_level_values(0) to get the first level index in a multi index group by to build a dataframe containing the aggregate value and the description dictionary value of the encoded value. I get the index for "airline_enc" values in the group by
def getAirlineByGrouped(grouped,dictGeneric):
mylist=[]
for key in grouped.index.get_level_values(0):
item=dictGeneric.get(key)
mylist.append(item)
return mylist
encoder=LabelEncoder()
df['airline_enc']=encoder.fit_transform(df['airline'])
dictAirline= df[['airline_enc','airline']].set_index('airline_enc').to_dict()
grouped=results.groupby(['airline_enc','rating'])['recommended'].count()
#print(grouped)
airlines=getAirlineByGrouped(grouped, dictAirline['airline'])
result_df=pd.DataFrame({'index': grouped.index.get_level_values(0),'value':grouped.values,'airline':airlines})
result_df.plot(x='airline',y='value')
plt.xticks(rotation=90)
Alternatively you could user the pandas.DataFrame.droplevel
method. The only downside in your example would be that your index values are no longer unique:
In: df.droplevel('second')
Out:
A B C
first
bar 0.895717 0.410835 -1.413681
bar 0.805244 0.813850 1.607920
baz -1.206412 0.132003 1.024180
baz 2.565646 -0.827317 0.569605
foo 1.431256 -0.076467 0.875906
foo 1.340309 -1.187678 -2.211372
qux -1.170299 1.130127 0.974466
qux -0.226169 -1.436737 -2.006747
I am trying to create a new DataFrame using only one index from a multi-indexed DataFrame.
A B C
first second
bar one 0.895717 0.410835 -1.413681
two 0.805244 0.813850 1.607920
baz one -1.206412 0.132003 1.024180
two 2.565646 -0.827317 0.569605
foo one 1.431256 -0.076467 0.875906
two 1.340309 -1.187678 -2.211372
qux one -1.170299 1.130127 0.974466
two -0.226169 -1.436737 -2.006747
Ideally, I would like something like this:
In: df.ix[level="first"]
and:
Out:
A B C
first
bar 0.895717 0.410835 -1.413681
0.805244 0.813850 1.607920
baz -1.206412 0.132003 1.024180
2.565646 -0.827317 0.569605
foo 1.431256 -0.076467 0.875906
1.340309 -1.187678 -2.211372
qux -1.170299 1.130127 0.974466
-0.226169 -1.436737 -2.006747
`
Essentially I want to drop all the other indexes of the multi-index other than level first
. Is there an easy way to do this?
One way could be to simply rebind df.index
to the desired level of the MultiIndex. You can do this by specifying the label name you want to keep:
df.index = df.index.get_level_values('first')
or use the level’s integer value:
df.index = df.index.get_level_values(0)
All other levels of the MultiIndex would disappear here.
The solution is fairly new and uses the df.xs
function as
In [88]: df.xs('bar', level='first')
Out[88]:
Second Third
one A -2.315312
B 0.497769
C 0.108523
two A -0.778303
B -1.555389
C -2.625022
dtype: float64
Can also do with multiple indices as
In [89]: df.xs(('bar', 'A'), level=('First', 'Third'))
Out[89]:
Second
one -2.315312
two -0.778303
dtype: float64
The setup for the examples is below
import pandas as pd
import numpy as np
arrays = [
np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
]
index = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
df.index.names = pd.core.indexes.frozen.FrozenList(['First', 'Second', 'Third'])
df = df.unstack()
I used the get_level_values(0) to get the first level index in a multi index group by to build a dataframe containing the aggregate value and the description dictionary value of the encoded value. I get the index for "airline_enc" values in the group by
def getAirlineByGrouped(grouped,dictGeneric):
mylist=[]
for key in grouped.index.get_level_values(0):
item=dictGeneric.get(key)
mylist.append(item)
return mylist
encoder=LabelEncoder()
df['airline_enc']=encoder.fit_transform(df['airline'])
dictAirline= df[['airline_enc','airline']].set_index('airline_enc').to_dict()
grouped=results.groupby(['airline_enc','rating'])['recommended'].count()
#print(grouped)
airlines=getAirlineByGrouped(grouped, dictAirline['airline'])
result_df=pd.DataFrame({'index': grouped.index.get_level_values(0),'value':grouped.values,'airline':airlines})
result_df.plot(x='airline',y='value')
plt.xticks(rotation=90)
Alternatively you could user the pandas.DataFrame.droplevel
method. The only downside in your example would be that your index values are no longer unique:
In: df.droplevel('second')
Out:
A B C
first
bar 0.895717 0.410835 -1.413681
bar 0.805244 0.813850 1.607920
baz -1.206412 0.132003 1.024180
baz 2.565646 -0.827317 0.569605
foo 1.431256 -0.076467 0.875906
foo 1.340309 -1.187678 -2.211372
qux -1.170299 1.130127 0.974466
qux -0.226169 -1.436737 -2.006747