How to expand the number of dataframe rows based on column titles?

Question

I’ve got a dataframe of the form:

  name  0  1  2
0    A  4  2  1
1    B  2  3  4
2    C  1  3  2

This is the result of grouping and summarizing data earlier in my real world data process.
What I would like to do now, is to expand (explode?) each row so that each element in that row fills a number of rows corresponding to the column title, except for name, so that the dataframe ends up like this:

name 0    1    2
------------------
A    0    1    2
A    0    1    nan
A    0    nan  nan
A    0    nan  nan
B    0    1    2
B    0    1    2
B    nan  1    2
B    nan  nan  2
C    0    1    2
C    nan  1    2
C    nan  1    nan

I’ve tried a bunch of stuff with df.iterrows() and assigning new columns to an empty dataframe with columns with the lenghts of the max of each row and filling up with nans, but it ended up buggy and really messy. It would be great if any of you pandas experts could take a look. Thank you for any suggestions!

Reproducible dataframe

import pandas as pd
df = pd.DataFrame({'name': ['A','B','C'], 0:[4,2,1], 1:[2,3,3], 2:[1,4,2]})

Asked By: vestland

||

Source

Answer 1

You can repeat your rows to the maximum number per original row, then use numpy broadcasting to mask the non-needed values:

# get list of columns other than name
cols = list(df.columns.difference(['name'], sort=False))

# repeat the rows to the max of each group
out = df.loc[df.index.repeat(df[cols].max(axis=1))]

# only keep the values < to the number of repeats
out[cols] = np.where(out[cols].gt(out.groupby('name').cumcount(), axis=0),
                     np.repeat([cols], len(out), axis=0), np.nan
                    )

Output:

  name    0    1    2
0    A  0.0  1.0  2.0
0    A  0.0  1.0  NaN
0    A  0.0  NaN  NaN
0    A  0.0  NaN  NaN
1    B  0.0  1.0  2.0
1    B  0.0  1.0  2.0
1    B  NaN  1.0  2.0
1    B  NaN  NaN  2.0
2    C  0.0  1.0  2.0
2    C  NaN  1.0  2.0
2    C  NaN  1.0  NaN

Intermediates:

df.loc[df.index.repeat(df[cols].max(axis=1))]
  name  0  1  2
0    A  4  2  1
0    A  4  2  1
0    A  4  2  1
0    A  4  2  1
1    B  2  3  4
1    B  2  3  4
1    B  2  3  4
1    B  2  3  4
2    C  1  3  2
2    C  1  3  2
2    C  1  3  2

out[cols].gt(out.groupby('name').cumcount(), axis=0)
       0      1      2
0   True   True   True
0   True   True  False
0   True  False  False
0   True  False  False
1   True   True   True
1   True   True   True
1  False   True   True
1  False  False   True
2   True   True   True
2  False   True   True
2  False   True  False

Answered By: mozway

How to expand the number of dataframe rows based on column titles?

Question:

Reproducible dataframe

Answers: