How to add empty sub-columns to a multi-index dataframe?

Question:

I’m working with a multiindex – multicolumns DataFrame in Pandas. I have 16 main columns (level 0), such as Climate Change, Ozone Depletion,etc; and each of them contains 2 subcolumns (level 1): Agriculture and Transformation.

enter image description here

I want to create a new subcolumn (Transport) to each main column, that I will fill later with new data. The thing is that I don’t want to make a loop (not sure how I would do that) but I can’t find any direct way to perform it.

Already tried the loop option:

for i in list(df.columns.levels[0]):
    df[i, 'Transport'] = ''

However, this duplicates the main columns and create them with the new empty field.
Is there anyway to do it without a loop? If there isn’t, how do I group these new created columns?

Already check this question but didn’t for me:
Add a subcolumn to each column on multilevel column index

Thanks in advance.

EDIT

import pandas as pd
import numpy as np

df = pd.DataFrame([[1,2,3,4], [5,6,7,8],[9,10,11,12]], columns = pd.MultiIndex.from_product([('Acidification', 'Climate Change'), ('Agriculture', 'Transformation')]), index=['potato', 'tomato', 'onion'])

With @Laurent answer I got the following dataframe:

enter image description here

In order to get what I wanted, I needed to change one line:

dfs = []
for col in df.columns.levels[0]:
    tmp = df.copy()[col]
    tmp.loc[:, ("Transport")] = "" #This is the modified line
    dfs.append(tmp)
new_df = pd.concat(dfs, axis=1, keys=df.columns.levels[0])

This output:

enter image description here

However, I want a would like to have a more general method (maybe this one is, but I don’t know how to do it).

Question 1:
If now I would like to add new two subcolumns (level 2) to "Transport" subcolumn (level 1) to all main columns (level 0), how can I do it?

I have tried the following code, but raise an error of unhashable type: 'slice'

# Try to apply logic above
dfn = []
for col in new_df.columns.levels[0]:
    for scol in new_df.columns.levels[1]:
        tmp = new_df.copy()[col][scol]
        tmp.loc[:, ('National')] = ""
        tmp.loc[:, ('Import')] = ""
        dfn.append(tmp)
new_dfn = pd.concat(dfn, axis=1, keys=new_df.columns.levels[0])

Question 2:
Is there a general way to add new subsubcolumns to a specific subcolumn (in all columns)?

Asked By: M. Merida-Floriano

||

Answers:

With the following toy dataframe as an example:

import pandas as pd

df = pd.DataFrame(
    [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]],
    columns=pd.MultiIndex.from_product([("col1", "col2"), ("sub_col1", "sub_col2")]),
    index=[0, 1, 2],
)

print(df)
# Output
      col1              col2
  sub_col1 sub_col2 sub_col1 sub_col2
0        1        2        3        4
1        5        6        7        8
2        9       10       11       12

Here is one way to do it using Pandas advanced indexing with hierarchical index and concat:

dfs = []
for col in df.columns.levels[0]:
    tmp = df.copy()[col]
    tmp.loc[:, ("sub_col3")] = ""
    dfs.append(tmp)

new_df = pd.concat(dfs, axis=1, keys=df.columns.levels[0])

Then:

print(new_df)
# Output
      col1                            col2
  sub_col1 sub_col2 col1 sub_col3 sub_col1 sub_col2 col2 sub_col3
0        1        2                      3        4
1        5        6                      7        8
2        9       10                     11       12

And here is how to add two new subcolumns (level 2) to a [target] subcolumn (level 1) to all main columns (level 0):

dfs = []
sub_col = "sub_col3"
sub_sub_cols = ["sub_sub_col1", "sub_sub_col2"]
for col in df.columns.levels[0]:
    tmp = df.copy()[col]
    tmp = pd.concat([tmp, pd.DataFrame([[""]*tmp.shape[1] for _ in range(tmp.shape[0])], columns=pd.MultiIndex.from_product([[sub_col], sub_sub_cols]))], axis=1)
    dfs.append(tmp)

new_df = pd.concat(dfs, axis=1, keys=df.columns.levels[0])

Then:

print(new_df)
# Output
    col1                                                             
  sub_col1 sub_col2 (sub_col3, sub_sub_col1) (sub_col3, sub_sub_col2)   
0        1        2                                                     
1        5        6                                                     
2        9       10                                                     

      col2                                                             
  sub_col1 sub_col2 (sub_col3, sub_sub_col1) (sub_col3, sub_sub_col2)  
0        3        4                                                    
1        7        8                                                    
2       11       12                                                    

Since all columns of a given level do not have the same number of subcolumns, columns header are represented by tuples.

Answered By: Laurent