How can I reorder multi-indexed dataframe columns at a specific level

Question:

I have a multi-indexed DataFrame with names attached to the column levels. I’d like to be able to easily shuffle the columns around so that they match the order specified by the user. Since this is down the pipeline, I’m not able to use this recommended solution and order them properly at creation time.

I have a data table that looks (something) like

Experiment           BASE           IWWGCW         IWWGDW
Lead Time                24     48      24     48      24     48
2010-11-27 12:00:00   0.997  0.991   0.998  0.990   0.998  0.990
2010-11-28 12:00:00   0.998  0.987   0.997  0.990   0.997  0.990
2010-11-29 12:00:00   0.997  0.992   0.997  0.992   0.997  0.992
2010-11-30 12:00:00   0.997  0.987   0.997  0.987   0.997  0.987
2010-12-01 12:00:00   0.996  0.986   0.996  0.986   0.996  0.986

I want to take in a list like ['IWWGCW', 'IWWGDW', 'BASE'] and reorder this to be:

Experiment           IWWGCW         IWWGDW         BASE           
Lead Time                24     48      24     48      24     48  
2010-11-27 12:00:00   0.998  0.990   0.998  0.990   0.997  0.991  
2010-11-28 12:00:00   0.997  0.990   0.997  0.990   0.998  0.987  
2010-11-29 12:00:00   0.997  0.992   0.997  0.992   0.997  0.992  
2010-11-30 12:00:00   0.997  0.987   0.997  0.987   0.997  0.987  
2010-12-01 12:00:00   0.996  0.986   0.996  0.986   0.996  0.986  

with the caveat that I don’t always know at what level “Experiment” will be. I tried (where df is the multi-indexed frame shown above)

df2 = df.reindex_axis(['IWWGCW', 'IWWGDW', 'BASE'], axis=1, level='Experiment')

but that didn’t seem to work – it completed successfully, but the DataFrame that was returned had its column order unchanged.

My workaround is to have a function like:

def reorder_columns(frame, column_name, new_order):
    """Shuffle the specified columns of the frame to match new_order."""

    index_level  = frame.columns.names.index(column_name)
    new_position = lambda t: new_order.index(t[index_level])
    new_index    = sorted(frame.columns, key=new_position)
    new_frame    = frame.reindex_axis(new_index, axis=1)
    return new_frame

where reorder_columns(df, 'Experiment', ['IWWGCW', 'IWWGDW', 'BASE']) does what I expect but it feels like I’m doing extra work. Is there an easier way to do this?

Asked By: Tim Whitcomb

||

Answers:

I don’t know of anything off-hand. Created an enhancement ticket about it:

http://github.com/pydata/pandas/issues/1864

Answered By: Wes McKinney

There is a very simple way: just create a new dataframe based on the original, with the correct order of multiindex columns:

multi_tuples = [('IWWGCW',24), ('IWWGCW',48), ('IWWGDW',24), ('IWWGDW',48)
    , ('BASE',24), ('BASE',48)]

multi_cols = pd.MultiIndex.from_tuples(multi_tuples, names=['Experiment', 'Lead Time'])

df_ordered_multi_cols = pd.DataFrame(df_ori, columns=multi_cols)
Answered By: ragesz

The comment by andrew_reece should be the accepted answer. Simply use reindex().

Copy & pasting from the github issue:

>>> df
                     vals
first second third       
mid   3rd    992     1.96
             562    12.06
      1st    73     -6.46
             818   -15.75
             658     5.90
btm   2nd    915     9.75
             474    -1.47
             905    -6.03
      1st    717     8.01
             909   -21.12
      3rd    616    11.91
             675     1.06
             579    -4.01
top   1st    241     1.79
             363     1.71
      3rd    677    13.38
             238   -16.77
             407    17.19
      2nd    728   -21.55
             36      8.09
>>> df.reindex(['top', 'mid', 'btm'], level='first')
                     vals
first second third       
top   1st    241     1.79
             363     1.71
      3rd    677    13.38
             238   -16.77
             407    17.19
      2nd    728   -21.55
             36      8.09
mid   3rd    992     1.96
             562    12.06
      1st    73     -6.46
             818   -15.75
             658     5.90
btm   2nd    915     9.75
             474    -1.47
             905    -6.03
      1st    717     8.01
             909   -21.12
      3rd    616    11.91
             675     1.06
             579    -4.01
>>> df.reindex(['1st', '2nd', '3rd'], level='second')
                     vals
first second third       
mid   1st    73     -6.46
             818   -15.75
             658     5.90
      3rd    992     1.96
             562    12.06
btm   1st    717     8.01
             909   -21.12
      2nd    915     9.75
             474    -1.47
             905    -6.03
      3rd    616    11.91
             675     1.06
             579    -4.01
top   1st    241     1.79
             363     1.71
      2nd    728   -21.55
             36      8.09
      3rd    677    13.38
             238   -16.77
             407    17.19
>>> df.reindex(['top', 'btm'], level='first').reindex(['1st', '2nd'], level='second')
                     vals
first second third       
top   1st    241     1.79
             363     1.71
      2nd    728   -21.55
             36      8.09
btm   1st    717     8.01
             909   -21.12
      2nd    915     9.75
             474    -1.47
             905    -6.03
Answered By: n1000

This is the simplest one that worked for me:

  1. for your selected level, create a list with columns in desired order;

  2. reindex your columns and create a MultiIndex object from that list, keep in mind this returns a tuple;

  3. use the MultiIndex object to reorder your DataFrame.

cols = ['IWWGCW', 'IWWGDW', 'BASE']
new_cols = df.columns.reindex(cols, level=0)
df.reindex(columns=new_cols[0]) #new_cols is a single item tuple

In one line:

df.reindex(columns=df.columns.reindex(['IWWGCW', 'IWWGDW', 'BASE'], level=0)[0])

voilá

A solution from my comment above, using pandas 1.3.2:

df.reindex(columns=['IWWGCW', 'IWWGDW', 'BASE'], level='Experiment')
Answered By: Irv

I’ve refined the answers here and written a function that should work out of the box on a pandas dataframe with a two layer multi-index. This should extendable to higher order column multi-indexes by changing the "by" argument in the 3rd line of the function.

def reorder_multindex_columns(df):
    level_names = list(df.columns.names)
    multi_tuples_df = pd.DataFrame.from_records(df.columns.values)
    multi_tuples_df = multi_tuples_df.sort_values(by = [0,1])
    multi_tuples = list(multi_tuples_df.to_records(index=False))
    multi_cols = pd.MultiIndex.from_tuples(multi_tuples, names = level_names)

    return pd.DataFrame(df, columns=multi_cols)
Answered By: Joseph Bloom
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.