Reshape pandas DataFrame from wide to long by splitting

Question:

I am trying to reshape the following data from wide to long format

df = pd.DataFrame(
    {
        "size_Ent": {
            pd.Timestamp("2021-01-01 00:00:00"): 600,
            pd.Timestamp("2021-01-02 00:00:00"): 930,
        },
        "size_Baci": {
            pd.Timestamp("2021-01-01 00:00:00"): 700,
            pd.Timestamp("2021-01-02 00:00:00"): 460,
        },
        "min_area_Ent": {
            pd.Timestamp("2021-01-01 00:00:00"): 1240,
            pd.Timestamp("2021-01-02 00:00:00"): 1503,
        },
        "min_area_Baci": {
            pd.Timestamp("2021-01-01 00:00:00"): 1285,
            pd.Timestamp("2021-01-02 00:00:00"): 953,
        },
    }
)
            size_Ent  size_Baci  min_area_Ent  min_area_Baci
2021-01-01       600        700          1240           1285
2021-01-02       930        460          1503            953

The problem is that the column names contain two different pieces of information separated by an underscore:

  1. The property/variable that was measured (e.g. size or min_area). I’d like these to remain as column names (without duplicates).
  2. A label for the item that was measured (e.g., Ent or Baci). I’d like these labels to become the values of a new column called ‘bacterium’.

Additionally, I’d like the row indexes to remain as timestamps.

It should look like this:

           bacterium  min_area  size
2021-01-01      Baci      1285   700
2021-01-01       Ent      1240   600
2021-01-02      Baci       953   460
2021-01-02       Ent      1503   930

I tried transposing the data frame with df.T but this did not give the result I want.

Asked By: Data_interested

||

Answers:

This can be solved in three simple steps:

First, notice that your column names are actually encoding a 2×2 MultiIndex, so let’s start by creating a MultiIndex from tuples. To do this, we need to first transform the existing column names into tuples. This is easy because we know they should be split at the last underscore.

# Convert column names into MultiIndex, giving an informative name to the level with label data
column_tuples = df.columns.str.rsplit("_", n=1)
column_tuples = [tuple(c) for c in column_tuples]
df.columns = pd.MultiIndex.from_tuples(column_tuples,names=[None,'bacterium'])

Next, use df.stack() to take the ‘bacterium’ level from the column MultiIndex and move it into a row MultiIndex. This is not quite the same as the transpose operation that you tried.

df = df.stack('bacterium')

Finally, use df.reset_index() with the level argument to take the bacterium level from the row MultiIndex and make it a proper column.

df = df.reset_index('bacterium')

Result:

           bacterium  min_area  size
2021-01-01      Baci      1285   700
2021-01-01       Ent      1240   600
2021-01-02      Baci       953   460
2021-01-02       Ent      1503   930
Answered By: evces

One option is with pd.wide_to_long:

(pd
.wide_to_long(
    df.reset_index(), 
    stubnames=['min_area', 'size'], 
    i = 'index', 
    j = 'bacterium', 
    sep='_', 
    suffix='.+')
.reset_index('bacterium')
)
           bacterium  min_area  size
index                               
2021-01-01       Ent      1240   600
2021-01-02       Ent      1503   930
2021-01-01      Baci      1285   700
2021-01-02      Baci       953   460

Another option with pivot_longer from pyjanitor – for this case we use a regex in names_pattern:

# pip install pyjanitor
import janitor
(df
.pivot_longer(
    index=None, 
    names_to = ('.value', 'bacterium'), 
    names_pattern=r'(size|min_area)_(.+)', 
    ignore_index=False)
)
           bacterium  size  min_area
2021-01-01       Ent   600      1240
2021-01-02       Ent   930      1503
2021-01-01      Baci   700      1285
2021-01-02      Baci   460       953

Another option is with stack:

temp = df.copy()
temp.columns = temp.columns.str.rsplit("_", n = 1, expand=True)
temp.columns.names = [None, 'bacterium']
temp.stack('bacterium').reset_index('bacterium')

           bacterium  min_area  size
2021-01-01      Baci      1285   700
2021-01-01       Ent      1240   600
2021-01-02      Baci       953   460
2021-01-02       Ent      1503   930
Answered By: sammywemmy
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.