Change deepest level of MultiIndex and append level

Question:

Consider the following DataFrame:

import numpy as np
import pandas as pd

arrays1 = [
    ["A", "A", "A", "B", "B", "B"],
    ["foo", "bar", "baz", "foo", "bar", "baz"],
]
tuples1 = list(zip(*arrays1))
index_values1 = pd.MultiIndex.from_tuples(tuples1)
df1 = pd.DataFrame(np.ones((6, 6)), index=index_values1, columns=index_values1)

print(df1)

         A              B          
       foo  bar  baz  foo  bar  baz
A foo  1.0  1.0  1.0  1.0  1.0  1.0
  bar  1.0  1.0  1.0  1.0  1.0  1.0
  baz  1.0  1.0  1.0  1.0  1.0  1.0
B foo  1.0  1.0  1.0  1.0  1.0  1.0
  bar  1.0  1.0  1.0  1.0  1.0  1.0
  baz  1.0  1.0  1.0  1.0  1.0  1.0
  

Say I want to replace the deepest level of either indices, columns, or both; as well as add a level, according to the following mapping:

d_idx = {
    'foo': ('qux', 'one'),
    'bar': ('quux', 'two'),
    'baz': ('corge', 'three'),
    }
    

Such that I get either:

                 A              B          
               foo  bar  baz  foo  bar  baz
A qux   one    1.0  1.0  1.0  1.0  1.0  1.0
  quux  two    1.0  1.0  1.0  1.0  1.0  1.0
  corge three  1.0  1.0  1.0  1.0  1.0  1.0
B qux   one    1.0  1.0  1.0  1.0  1.0  1.0
  quux  two    1.0  1.0  1.0  1.0  1.0  1.0
  corge three  1.0  1.0  1.0  1.0  1.0  1.0
  

or

         A               B           
       qux quux corge  qux quux corge
       one  two three  one  two three
A foo  1.0  1.0   1.0  1.0  1.0   1.0
  bar  1.0  1.0   1.0  1.0  1.0   1.0
  baz  1.0  1.0   1.0  1.0  1.0   1.0
B foo  1.0  1.0   1.0  1.0  1.0   1.0
  bar  1.0  1.0   1.0  1.0  1.0   1.0
  baz  1.0  1.0   1.0  1.0  1.0   1.0
  

or

                 A               B           
               qux quux corge  qux quux corge
               one  two three  one  two three
A qux   one    1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0
B qux   one    1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0

I have tried a number of ways, which seem to work but don’t seem elegant to me. In particular, I want to make sure that this mapping is robust (e.g. no reordering due to unstacking indices etc.).

My first approach was by constructing a correspondance matrix:

     qux quux corge
     one  two three
foo  1.0  0.0   0.0
bar  0.0  1.0   0.0
baz  0.0  0.0   1.0

The advantage of this approach is that the mapping from one set of indices to the other is robust, by taking the dot product with this matrix and an unstacked df1. However by unstacking, the indices are implicitly sorted. This means I need to somehow retain the original order of the index before being able to take the dot product. While probably doable, this doesn’t seem elegant to me.

df1_u = df1.unstack(level=0)

    A   A   A   A   A   A   B   B   B   B   B   B
    foo foo bar bar baz baz foo foo bar bar baz baz
    A   B   A   B   A   B   A   B   A   B   A   B
bar 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
baz 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
foo 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

Next, I tried changing the indices in a for-loop:

l_idx_old = list(df1.index)
l_idx_new = []

for t_idx_old in l_idx_old:
    idx0, idx1_old = t_idx_old
    idx1_new, idx2_new = d_idx[idx1_old]
    print(idx1_old, idx1_new, idx2_new)
    t_idx_new = (idx0, idx1_new, idx2_new)
    l_idx_new.append(t_idx_new)

df1.index = pd.MultiIndex.from_tuples(l_idx_new)

This works, however the last line is not robust, as there is no check whether the indices are assigned correctly.

In short, is there a robust and elegant way to carry out this intended mutation? Any help is much appreciated.

Asked By: BdB

||

Answers:

If mapping one level of MultiIndex get new MultiIndex, so possible append to existing by DataFrame.set_index with remove mapped level by DataFrame.droplevel:

d_idx = {
    'foo': ('qux', 'one'),
    'bar': ('quux', 'two'),
    'baz': ('corge', 'three'),
    }

df1 = df1.set_index(df1.index.get_level_values(1).map(d_idx), append=True).droplevel(1)
print (df1)
                 A              B          
               foo  bar  baz  foo  bar  baz
A qux   one    1.0  1.0  1.0  1.0  1.0  1.0
  quux  two    1.0  1.0  1.0  1.0  1.0  1.0
  corge three  1.0  1.0  1.0  1.0  1.0  1.0
B qux   one    1.0  1.0  1.0  1.0  1.0  1.0
  quux  two    1.0  1.0  1.0  1.0  1.0  1.0
  corge three  1.0  1.0  1.0  1.0  1.0  1.0

For columns is it possible with helper DataFrame and same principe:

new = df1.columns.get_level_values(1).map(d_idx)

df1.columns = df1.columns.to_frame().set_index(new, append=True).droplevel(1).index

print (df1)

         A               B           
       qux quux corge  qux quux corge
       one  two three  one  two three
A foo  1.0  1.0   1.0  1.0  1.0   1.0
  bar  1.0  1.0   1.0  1.0  1.0   1.0
  baz  1.0  1.0   1.0  1.0  1.0   1.0
B foo  1.0  1.0   1.0  1.0  1.0   1.0
  bar  1.0  1.0   1.0  1.0  1.0   1.0
  baz  1.0  1.0   1.0  1.0  1.0   1.0
Answered By: jezrael
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.