Transforming pandas data frame. Sort of melting

Question

I have this data frame:

pd.DataFrame({'day': [1, 1, 2, 2], 'category': ['a', 'b', 'a', 'b'],
              'min_feature1': [1, 2, 3, 4], 'max_feature1': [8, 9, 10, 11],
              'min_feature2': [2, 3, 4, 5], 'max_feature2': [6, 9, 12, 13]})

The result looks like this:

day	category	min_feature1	max_feature1	min_feature2	max_feature2
1	a	1	8	2	6
1	b	2	9	3	9
2	a	3	10	4	12
2	b	4	11	5	13

I want to transform this data, so it looks like this:

pd.DataFrame([[1, 'a', 'feature1', 1, 8],
 [1, 'a', 'feature2', 2, 6],
[1, 'b', 'feature1', 2, 9],
[1, 'b', 'feature2', 3, 9],
[2, 'a', 'feature1', 3, 10],
[2, 'a', 'feature2', 4, 12],
[2, 'b', 'feature1', 4, 11],
[2, 'b', 'feature2', 5, 13],], columns=['day', 'category', 'feature', 'min', 'max'])

day	category	feature	min	max
1	a	feature1	1	8
1	a	feature2	2	6
1	b	feature1	2	9
1	b	feature2	3	9
2	a	feature1	3	10
2	a	feature2	4	12
2	b	feature1	4	11
2	b	feature2	5	13

How can I do this?

Asked By: Andrey Lukyanenko

||

Source

Answer 1

One option using a custom reshape with a MultiIndex with str.split, then stack:

(df.set_index(['day', 'category'])
   .pipe(lambda d: d.set_axis(d.columns.str.split('_', n=1, expand=True), axis=1))
   .rename_axis(columns=(None, 'features'))
   .stack().reset_index()
)

Or with janitor‘s pivot_longer:

# pip install janitor
import janitor

out = df.pivot_longer(['day', 'category'], sort_by_appearance=True,
                      names_sep='_', names_to=('.value', 'feature'))

Output:

   day category  features  max  min
0    1        a  feature1    8    1
1    1        a  feature2    6    2
2    1        b  feature1    9    2
3    1        b  feature2    9    3
4    2        a  feature1   10    3
5    2        a  feature2   12    4
6    2        b  feature1   11    4
7    2        b  feature2   13    5

Answered By: mozway

Answer 2

Use str.split for MultiIndex with reshape by DataFrame.stack:

df1 = df.set_index(['day','category'])
df1.columns= df1.columns.str.split('_', expand=True)
df1 = df1.rename_axis(columns=(None,'feature')).stack().reset_index()
print (df1)
   day category   feature  max  min
0    1        a  feature1    8    1
1    1        a  feature2    6    2
2    1        b  feature1    9    2
3    1        b  feature2    9    3
4    2        a  feature1   10    3
5    2        a  feature2   12    4
6    2        b  feature1   11    4
7    2        b  feature2   13    5

Another idea with wide_to_long:

df.columns = df.columns.str.replace(r'(w+)_s*(w+)', r'2_1', regex=True)
df = (pd.wide_to_long(df, 
                     stubnames=['feature1','feature2'],
                     i=['day','category'], 
                     j='tmp',
                     sep='_', 
                     suffix=r'w+').rename_axis(columns='feature')
       .stack()
       .unstack(2)
       .reset_index()
       .rename_axis(columns=None))
print (df)
   day category   feature  max  min
0    1        a  feature1    8    1
1    1        a  feature2    6    2
2    1        b  feature1    9    2
3    1        b  feature2    9    3
4    2        a  feature1   10    3
5    2        a  feature2   12    4
6    2        b  feature1   11    4
7    2        b  feature2   13    5

Answered By: jezrael

Answer 3

You can also use melt as alternative:

out = (df.rename(columns=lambda x: tuple(m) if len(m := x.split('_')) > 1 else x)
         .melt(['day', 'category'])
         .assign(var1=lambda x: x['variable'].str[1], var2=lambda x: x['variable'].str[0])
         .pivot(index=['day', 'category', 'var1'], columns='var2', values='value')
         .rename_axis(columns=None).reset_index())

Output:

>>> out
   day category      var1  max  min
0    1        a  feature1    8    1
1    1        a  feature2    6    2
2    1        b  feature1    9    2
3    1        b  feature2    9    3
4    2        a  feature1   10    3
5    2        a  feature2   12    4
6    2        b  feature1   11    4
7    2        b  feature2   13    5

Step by step for better understanding the transformation:

# Step 1: rename your columns
>>> out = df.rename(columns=lambda x: tuple(m) if len(m := x.split('_')) > 1 else x)
   day category  (min, feature1)  (max, feature1)  (min, feature2)  (max, feature2)
0    1        a                1                8                2                6
1    1        b                2                9                3                9
2    2        a                3               10                4               12
3    2        b                4               11                5               13

# Step 2: flatten your dataframe
>>> out = out.melt(['day', 'category'])
    day category         variable  value
0     1        a  (min, feature1)      1
1     1        b  (min, feature1)      2
2     2        a  (min, feature1)      3
3     2        b  (min, feature1)      4
4     1        a  (max, feature1)      8
5     1        b  (max, feature1)      9
...

# Step 3: expand variable column in two new variables
>>> out = out.assign(var1=lambda x: x['variable'].str[1], var2=lambda x: x['variable'].str[0])
    day category         variable  value      var1 var2
0     1        a  (min, feature1)      1  feature1  min
1     1        b  (min, feature1)      2  feature1  min
2     2        a  (min, feature1)      3  feature1  min
3     2        b  (min, feature1)      4  feature1  min
4     1        a  (max, feature1)      8  feature1  max
5     1        b  (max, feature1)      9  feature1  max
...

# Step 4: reshape your dataframe
>>> out = out.pivot(index=['day', 'category', 'var1'], columns='var2', values='value')
var2                   max  min
day category var1              
1   a        feature1    8    1
             feature2    6    2
    b        feature1    9    2
             feature2    9    3
2   a        feature1   10    3
             feature2   12    4
    b        feature1   11    4
             feature2   13    5

# Step 5: final output
>>> out = out.rename_axis(columns=None).reset_index()
   day category      var1  max  min
0    1        a  feature1    8    1
1    1        a  feature2    6    2
2    1        b  feature1    9    2
3    1        b  feature2    9    3
4    2        a  feature1   10    3
5    2        a  feature2   12    4
6    2        b  feature1   11    4
7    2        b  feature2   13    5

Answered By: Corralien

Transforming pandas data frame. Sort of melting

Question:

Answers: