Combine two pandas index slices
Question:
How can two pandas.IndexSlice s be combined into one?
Set up of the problem:
import pandas as pd
import numpy as np
idx = pd.IndexSlice
cols = pd.MultiIndex.from_product([['A', 'B', 'C'], ['x', 'y'], ['a', 'b']])
df = pd.DataFrame(np.arange(len(cols)*2).reshape((2, len(cols))), columns=cols)
df:
A B C
x y x y x y
a b a b a b a b a b a b
0 0 1 2 3 4 5 6 7 8 9 10 11
1 12 13 14 15 16 17 18 19 20 21 22 23
How can the two slices idx['A', 'y', :]
and idx[['B', 'C'], 'x', :]
, be combined to show in one dataframe?
Separately they are:
df.loc[:, idx['A', 'y',:]]
A
y
a b
0 2 3
1 14 15
df.loc[:, idx[['B', 'C'], 'x', :]]
B C
x x
a b a b
0 4 5 8 9
1 16 17 20 21
Simply combining them as a list does not play nicely:
df.loc[:, [idx['A', 'y',:], idx[['B', 'C'], 'x',:]]]
....
TypeError: unhashable type: 'slice'
My current solution is incredibly clunky, but gives the sub df that I’m looking for:
df.loc[:, df.loc[:, idx['A', 'y', :]].columns.to_list() + df.loc[:,
idx[['B', 'C'], 'x', :]].columns.to_list()]
A B C
y x x
a b a b a b
0 2 3 4 5 8 9
1 14 15 16 17 20 21
However this doesn’t work when one of the slices is just a series (as expected), which is less fun:
df.loc[:, df.loc[:, idx['A', 'y', 'a']].columns.to_list() + df.loc[:,
idx[['B', 'C'], 'x', :]].columns.to_list()]
...
AttributeError: 'Series' object has no attribute 'columns'
Are there any better alternatives to what I’m currently doing that would ideally work with dataframe slices and series slices?
Answers:
General solution is join together both slice:
a = df.loc[:, idx['A', 'y', 'a']]
b = df.loc[:, idx[['B', 'C'], 'x', :]]
df = pd.concat([a, b], axis=1)
print (df)
A B C
y x x
a a b a b
0 2 4 5 8 9
1 14 16 17 20 21
One option is with pyjanitor select_columns to select via tuples:
# pip install pyjanitor
import pandas as pd
df.select_columns(('A','y'), ('B','x'), ('C','x'))
A B C
y x x
a b a b a b
0 2 3 4 5 8 9
1 14 15 16 17 20 21
How can two pandas.IndexSlice s be combined into one?
Set up of the problem:
import pandas as pd
import numpy as np
idx = pd.IndexSlice
cols = pd.MultiIndex.from_product([['A', 'B', 'C'], ['x', 'y'], ['a', 'b']])
df = pd.DataFrame(np.arange(len(cols)*2).reshape((2, len(cols))), columns=cols)
df:
A B C
x y x y x y
a b a b a b a b a b a b
0 0 1 2 3 4 5 6 7 8 9 10 11
1 12 13 14 15 16 17 18 19 20 21 22 23
How can the two slices idx['A', 'y', :]
and idx[['B', 'C'], 'x', :]
, be combined to show in one dataframe?
Separately they are:
df.loc[:, idx['A', 'y',:]]
A
y
a b
0 2 3
1 14 15
df.loc[:, idx[['B', 'C'], 'x', :]]
B C
x x
a b a b
0 4 5 8 9
1 16 17 20 21
Simply combining them as a list does not play nicely:
df.loc[:, [idx['A', 'y',:], idx[['B', 'C'], 'x',:]]]
....
TypeError: unhashable type: 'slice'
My current solution is incredibly clunky, but gives the sub df that I’m looking for:
df.loc[:, df.loc[:, idx['A', 'y', :]].columns.to_list() + df.loc[:,
idx[['B', 'C'], 'x', :]].columns.to_list()]
A B C
y x x
a b a b a b
0 2 3 4 5 8 9
1 14 15 16 17 20 21
However this doesn’t work when one of the slices is just a series (as expected), which is less fun:
df.loc[:, df.loc[:, idx['A', 'y', 'a']].columns.to_list() + df.loc[:,
idx[['B', 'C'], 'x', :]].columns.to_list()]
...
AttributeError: 'Series' object has no attribute 'columns'
Are there any better alternatives to what I’m currently doing that would ideally work with dataframe slices and series slices?
General solution is join together both slice:
a = df.loc[:, idx['A', 'y', 'a']]
b = df.loc[:, idx[['B', 'C'], 'x', :]]
df = pd.concat([a, b], axis=1)
print (df)
A B C
y x x
a a b a b
0 2 4 5 8 9
1 14 16 17 20 21
One option is with pyjanitor select_columns to select via tuples:
# pip install pyjanitor
import pandas as pd
df.select_columns(('A','y'), ('B','x'), ('C','x'))
A B C
y x x
a b a b a b
0 2 3 4 5 8 9
1 14 15 16 17 20 21