groupby and stack column into a single column

Question:

I have a dataframe as follows:

df = pd.DataFrame({'Product' : ['A'], 
              'Size' : ["['XL','L','S','M']"],
               'Color' : ["['Blue','Red','Green']"]})

print(df)

Product                Size                   Color
0       A  ['XL','L','S','M']  ['Blue','Red','Green']

I need to transform the frame for an ingestion system which only accepts the following format:

target_df = pd.DataFrame({'Description' : ['Product','Color','Color','Color','Size','Size','Size','Size'],
             'Agg' : ['A','Blue','Green','Red','XL','L','S','M']})


  Description    Agg
0     Product      A
1       Color   Blue
2       Color  Green
3       Color    Red
4        Size     XL
5        Size      L
6        Size      S
7        Size      M

I’ve attempted all forms of explode, groupby and even itterrows, but I can’t get it to line up. I have thousands of Products. with a few groupby and explodes I can stack the column but then I have duplicate Product Names which I need to avoid, the order is important too.

Asked By: Umar.H

||

Answers:

Try:

df['Size']=df['Size'].map(eval)
df['Color']=df['Color'].map(eval)
df=df.stack().explode()

Outputs:

0  Product        A
   Size          XL
   Size           L
   Size           S
   Size           M
   Color       Blue
   Color        Red
   Color      Green
dtype: object
Answered By: Grzegorz Skibinski

Here’s a solution without eval:

(df.T[0].str.strip('[]')
   .str.split(',', expand=True)
   .stack().str.strip("''")
   .reset_index(level=1, drop=True)
   .rename_axis(index='Description')
   .reset_index(name='Agg')
)

Output:

  Description    Agg
0     Product      A
1        Size     XL
2        Size      L
3        Size      S
4        Size      M
5       Color   Blue
6       Color    Red
7       Color  Green
Answered By: Quang Hoang

Although both of the answers are already sufficient, thought this was one was nice to work out. Heres a method using explode and melt:

from ast import literal_eval
# needed, because somehow apply(literal_eval) wanst working
for col in df[['Size', 'Color']]:
    df[col] = df[col].apply(literal_eval)

dfn = df.explode('Size').reset_index(drop=True)
dfn['Color'] = df['Color'].explode().reset_index(drop=True).reindex(dfn.index)
dfn = dfn.melt(var_name='Description', value_name='Agg').ffill().drop_duplicates().reset_index(drop=True)

  Description    Agg
0     Product      A
1        Size     XL
2        Size      L
3        Size      S
4        Size      M
5       Color   Blue
6       Color    Red
7       Color  Green
Answered By: Erfan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.