Pandas: Multilevel column names


pandas has support for multi-level column names:

>>>  x = pd.DataFrame({'instance':['first','first','first'],'foo':['a','b','c'],'bar':rand(3)})
>>> x = x.set_index(['instance','foo']).transpose()
>>> x.columns
[(u'first', u'a'), (u'first', u'b'), (u'first', u'c')]
>>> x
instance     first                    
foo              a         b         c
bar       0.102885  0.937838  0.907467

This feature is very useful since it allows multiple versions of the same dataframe to be appended ‘horizontally’ with the 1st level of the column names (in my example instance) distinguishing the instances.

Imagine I already have a dataframe like this:

                 a         b         c
bar       0.102885  0.937838  0.907467

Is there a nice way to add another level to the column names, similar to this for row index:

x['instance'] = 'first'
Asked By: LondonRob



Try this:



Answered By: user3377361

You can use concat. Give it a dictionary of dataframes where the key is the new column level you want to add.

In [46]: d = {}

In [47]: d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                         data=[[10, 0.89, 0.98, 0.31],
                                               [20, 0.34, 0.78, 0.34]]).set_index('idx')

In [48]: pd.concat(d, axis=1)
              a     b     c
10         0.89  0.98  0.31
20         0.34  0.78  0.34

You can use the same technique to create multiple levels.

In [49]: d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                          data=[[10, 0.29, 0.63, 0.99],
                                                [20, 0.23, 0.26, 0.98]]).set_index('idx')

In [50]: pd.concat(d, axis=1)
    first_level             second_level
              a     b     c            a     b     c
10         0.89  0.98  0.31         0.29  0.63  0.99
20         0.34  0.78  0.34         0.23  0.26  0.98
Answered By: Carl

Here is a function that can help you create the tuple, that can be used by pd.MultiIndex.from_tuples(), a bit more generically. Got the idea from @user3377361.

def create_tuple_for_for_columns(df_a, multi_level_col):
    Create a columns tuple that can be pandas MultiIndex to create multi level column

    :param df_a: pandas dataframe containing the columns that must form the first level of the multi index
    :param multi_level_col: name of second level column
    :return: tuple containing (second_level_col, firs_level_cols)
    temp_columns = []
    for item in df_a.columns:
        temp_columns.append((multi_level_col, item))
    return temp_columns

It can be used like this:

df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
columns = create_tuple_for_for_columns(df, 'c')
df.columns = pd.MultiIndex.from_tuples(columns)
Answered By: Charl

No need to create a list of tuples

Use: pd.MultiIndex.from_product(iterables)

import pandas as pd
import numpy as np

df = pd.Series(np.random.rand(3), index=["a","b","c"]).to_frame().T
df.columns = pd.MultiIndex.from_product([["new_label"], df.columns])

Resultant DataFrame:

          a         b         c
0   0.25999  0.337535  0.333568

Pull request from Jan 25, 2014

Answered By: Ian Zurutuza

A lot of these solutions seem just a bit more complex than they need to be.

I prefer to make things look as simple and intuitive as possible when speed isn’t absolutely necessary. I think this solution accomplishes that.
Tested in versions of pandas as early as 0.22.0.

Simply create a DataFrame (ignore columns in the first step) and then set colums equal to your n-dim list of column names.

In [1]: import pandas as pd                                                                                                                                                                                          

In [2]: df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2]])                                                                                                                                                              

In [3]: df                                                                                                                                                                                                           
   0  1  2  3
0  1  1  1  1
1  2  2  2  2

In [4]: df.columns = [['a', 'c', 'e', 'g'], ['b', 'd', 'f', 'h']]                                                                                                                                                    

In [5]: df                                                                                                                                                                                                           
   a  c  e  g
   b  d  f  h
0  1  1  1  1
1  2  2  2  2
Answered By: Keith
x = [('G1','a'),("G1",'b'),("G2",'a'),('G2','b')]
y = [('K1','l'),("K1",'m'),("K2",'l'),('K2','m'),("K3",'l'),('K3','m')]
row_list = pd.MultiIndex.from_tuples(x)
col_list = pd.MultiIndex.from_tuples(y)

A = pd.DataFrame(np.random.randint(2,5,(4,6)), row_list,col_list)

This is the most simple and easy way to create MultiLevel columns and rows.

enter image description here

Answered By: Raj_Ame09
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.