Join the columns and list them one by one in row

Question:

I have Pandas dataset with orders and different types of packaging inside this order separated by comma inside the cell.

order container box
a c1,c2,c3 b1,b2,b3
b c4,c5,c6 b4,b5,b6

Need to get table with two columns: "order" and "content" with all values from both container and box.

I could only merge the container and box – but do not know how to list them row by row.

Needed table is:

order content
a c1
a c2
a c3
a b1
a b2
a b3
b c4
b c5
b c6
b b4
b b5
b b6
Asked By: bluekit46

||

Answers:

I would do it following way

import pandas as pd
df = pd.DataFrame({"order":["a","b"],"container":["c1,c2,c3","c4,c5,c6"],"box":["b1,b2,b3","b4,b5,b6"]})
df2 = pd.concat([df.order,df.container.str.split(",",expand=True),df.box.str.split(",",expand=True)],axis=1)
df2 = df2.melt("order")[["order","value"]]
print(df2)

output

   order value
0      a    c1
1      b    c4
2      a    c2
3      b    c5
4      a    c3
5      b    c6
6      a    b1
7      b    b4
8      a    b2
9      b    b5
10     a    b3
11     b    b6

Explanation: use .str.split with expand=True to get ,-sheared values into separate columns, then use .concat to create DataFrame from them, then use .melt at order to get desired result and then select required columns.

Answered By: Daweo

Because you have two columns, you need to join columns with ',' then split and explode like the below:

(df['container'] + "," + df['box']).str.split(',').explode().to_frame().reset_index().rename(columns={'index':'order', 0:'content'})

    order content
0       0      c1
1       0      c2
2       0      c3
3       0      b1
4       0      b2
5       0      b3
6       1      c4
7       1      c5
8       1      c6
9       1      b4
10      1      b5
11      1      b6
Answered By: I'mahdi

You can stack, split+explode and convert to DataFrame:

out = (df.set_index('order').stack() # set other columns aside and stack
         .str.split(',').explode() # expand values to multiple rows
          # cleanup
         .reset_index('order', name='content').reset_index(drop=True)
      )
print(out)

Output:

   order content
0      a      c1
1      a      c2
2      a      c3
3      a      b1
4      a      b2
5      a      b3
6      b      c4
7      b      c5
8      b      c6
9      b      b4
10     b      b5
11     b      b6

Alternative with melt:

(df.melt('order', value_name='content')
   .assign(content=lambda d: d['content'].str.split(','))
   .explode('content').drop(columns='variable')
)
Answered By: mozway

You could also use pd.DataFrame.melt, pd.DataFrame.set_index, pd.DataFrame.pipe, pd.DataFrame.sort_index:

(df
 .melt(id_vars='order', value_vars=['container', 'box'])
 .set_index('order')
 .pipe(lambda x: x['value'].str.split(',').explode())
 .sort_index()
)

order
a    c1
a    c2
a    c3
a    b1
a    b2
a    b3
b    c4
b    c5
b    c6
b    b4
b    b5
b    b6

or even a more concise approach::

(df
 .melt(id_vars='order', value_vars=['container', 'box'])
 .set_index('order')['value'].str.split(',').explode()
 .sort_index())
Answered By: Anoushiravan R
df1.set_index("order").agg(','.join,axis=1).map(lambda x:x.split(",")).explode().rename("content")

out

order
a    c1
a    c2
a    c3
a    b1
a    b2
a    b3
b    c4
b    c5
b    c6
b    b4
b    b5
b    b6
Answered By: G.G
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.