Pandas: Multilevel column names
Question:
pandas
has support for multi-level column names:
>>> x = pd.DataFrame({'instance':['first','first','first'],'foo':['a','b','c'],'bar':rand(3)})
>>> x = x.set_index(['instance','foo']).transpose()
>>> x.columns
MultiIndex
[(u'first', u'a'), (u'first', u'b'), (u'first', u'c')]
>>> x
instance first
foo a b c
bar 0.102885 0.937838 0.907467
This feature is very useful since it allows multiple versions of the same dataframe to be appended ‘horizontally’ with the 1st level of the column names (in my example instance
) distinguishing the instances.
Imagine I already have a dataframe like this:
a b c
bar 0.102885 0.937838 0.907467
Is there a nice way to add another level to the column names, similar to this for row index:
x['instance'] = 'first'
x.set_level('instance',append=True)
Answers:
Try this:
df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
columns=[('c','a'),('c','b')]
df.columns=pd.MultiIndex.from_tuples(columns)
You can use concat
. Give it a dictionary of dataframes where the key is the new column level you want to add.
In [46]: d = {}
In [47]: d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
data=[[10, 0.89, 0.98, 0.31],
[20, 0.34, 0.78, 0.34]]).set_index('idx')
In [48]: pd.concat(d, axis=1)
Out[48]:
first_level
a b c
idx
10 0.89 0.98 0.31
20 0.34 0.78 0.34
You can use the same technique to create multiple levels.
In [49]: d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
data=[[10, 0.29, 0.63, 0.99],
[20, 0.23, 0.26, 0.98]]).set_index('idx')
In [50]: pd.concat(d, axis=1)
Out[50]:
first_level second_level
a b c a b c
idx
10 0.89 0.98 0.31 0.29 0.63 0.99
20 0.34 0.78 0.34 0.23 0.26 0.98
Here is a function that can help you create the tuple, that can be used by pd.MultiIndex.from_tuples(), a bit more generically. Got the idea from @user3377361.
def create_tuple_for_for_columns(df_a, multi_level_col):
"""
Create a columns tuple that can be pandas MultiIndex to create multi level column
:param df_a: pandas dataframe containing the columns that must form the first level of the multi index
:param multi_level_col: name of second level column
:return: tuple containing (second_level_col, firs_level_cols)
"""
temp_columns = []
for item in df_a.columns:
temp_columns.append((multi_level_col, item))
return temp_columns
It can be used like this:
df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
columns = create_tuple_for_for_columns(df, 'c')
df.columns = pd.MultiIndex.from_tuples(columns)
No need to create a list of tuples
Use: pd.MultiIndex.from_product(iterables)
import pandas as pd
import numpy as np
df = pd.Series(np.random.rand(3), index=["a","b","c"]).to_frame().T
df.columns = pd.MultiIndex.from_product([["new_label"], df.columns])
Resultant DataFrame:
new_label
a b c
0 0.25999 0.337535 0.333568
A lot of these solutions seem just a bit more complex than they need to be.
I prefer to make things look as simple and intuitive as possible when speed isn’t absolutely necessary. I think this solution accomplishes that.
Tested in versions of pandas as early as 0.22.0
.
Simply create a DataFrame (ignore columns in the first step) and then set colums equal to your n-dim list of column names.
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2]])
In [3]: df
Out[3]:
0 1 2 3
0 1 1 1 1
1 2 2 2 2
In [4]: df.columns = [['a', 'c', 'e', 'g'], ['b', 'd', 'f', 'h']]
In [5]: df
Out[5]:
a c e g
b d f h
0 1 1 1 1
1 2 2 2 2
x = [('G1','a'),("G1",'b'),("G2",'a'),('G2','b')]
y = [('K1','l'),("K1",'m'),("K2",'l'),('K2','m'),("K3",'l'),('K3','m')]
row_list = pd.MultiIndex.from_tuples(x)
col_list = pd.MultiIndex.from_tuples(y)
A = pd.DataFrame(np.random.randint(2,5,(4,6)), row_list,col_list)
A
This is the most simple and easy way to create MultiLevel columns and rows.
pandas
has support for multi-level column names:
>>> x = pd.DataFrame({'instance':['first','first','first'],'foo':['a','b','c'],'bar':rand(3)})
>>> x = x.set_index(['instance','foo']).transpose()
>>> x.columns
MultiIndex
[(u'first', u'a'), (u'first', u'b'), (u'first', u'c')]
>>> x
instance first
foo a b c
bar 0.102885 0.937838 0.907467
This feature is very useful since it allows multiple versions of the same dataframe to be appended ‘horizontally’ with the 1st level of the column names (in my example instance
) distinguishing the instances.
Imagine I already have a dataframe like this:
a b c
bar 0.102885 0.937838 0.907467
Is there a nice way to add another level to the column names, similar to this for row index:
x['instance'] = 'first'
x.set_level('instance',append=True)
Try this:
df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
columns=[('c','a'),('c','b')]
df.columns=pd.MultiIndex.from_tuples(columns)
You can use concat
. Give it a dictionary of dataframes where the key is the new column level you want to add.
In [46]: d = {}
In [47]: d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
data=[[10, 0.89, 0.98, 0.31],
[20, 0.34, 0.78, 0.34]]).set_index('idx')
In [48]: pd.concat(d, axis=1)
Out[48]:
first_level
a b c
idx
10 0.89 0.98 0.31
20 0.34 0.78 0.34
You can use the same technique to create multiple levels.
In [49]: d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
data=[[10, 0.29, 0.63, 0.99],
[20, 0.23, 0.26, 0.98]]).set_index('idx')
In [50]: pd.concat(d, axis=1)
Out[50]:
first_level second_level
a b c a b c
idx
10 0.89 0.98 0.31 0.29 0.63 0.99
20 0.34 0.78 0.34 0.23 0.26 0.98
Here is a function that can help you create the tuple, that can be used by pd.MultiIndex.from_tuples(), a bit more generically. Got the idea from @user3377361.
def create_tuple_for_for_columns(df_a, multi_level_col):
"""
Create a columns tuple that can be pandas MultiIndex to create multi level column
:param df_a: pandas dataframe containing the columns that must form the first level of the multi index
:param multi_level_col: name of second level column
:return: tuple containing (second_level_col, firs_level_cols)
"""
temp_columns = []
for item in df_a.columns:
temp_columns.append((multi_level_col, item))
return temp_columns
It can be used like this:
df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
columns = create_tuple_for_for_columns(df, 'c')
df.columns = pd.MultiIndex.from_tuples(columns)
No need to create a list of tuples
Use: pd.MultiIndex.from_product(iterables)
import pandas as pd
import numpy as np
df = pd.Series(np.random.rand(3), index=["a","b","c"]).to_frame().T
df.columns = pd.MultiIndex.from_product([["new_label"], df.columns])
Resultant DataFrame:
new_label
a b c
0 0.25999 0.337535 0.333568
A lot of these solutions seem just a bit more complex than they need to be.
I prefer to make things look as simple and intuitive as possible when speed isn’t absolutely necessary. I think this solution accomplishes that.
Tested in versions of pandas as early as 0.22.0
.
Simply create a DataFrame (ignore columns in the first step) and then set colums equal to your n-dim list of column names.
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2]])
In [3]: df
Out[3]:
0 1 2 3
0 1 1 1 1
1 2 2 2 2
In [4]: df.columns = [['a', 'c', 'e', 'g'], ['b', 'd', 'f', 'h']]
In [5]: df
Out[5]:
a c e g
b d f h
0 1 1 1 1
1 2 2 2 2
x = [('G1','a'),("G1",'b'),("G2",'a'),('G2','b')]
y = [('K1','l'),("K1",'m'),("K2",'l'),('K2','m'),("K3",'l'),('K3','m')]
row_list = pd.MultiIndex.from_tuples(x)
col_list = pd.MultiIndex.from_tuples(y)
A = pd.DataFrame(np.random.randint(2,5,(4,6)), row_list,col_list)
A
This is the most simple and easy way to create MultiLevel columns and rows.