Join the columns and list them one by one in row
Question:
I have Pandas dataset with orders and different types of packaging inside this order separated by comma inside the cell.
order
container
box
a
c1,c2,c3
b1,b2,b3
b
c4,c5,c6
b4,b5,b6
Need to get table with two columns: "order" and "content" with all values from both container and box.
I could only merge the container and box – but do not know how to list them row by row.
Needed table is:
order
content
a
c1
a
c2
a
c3
a
b1
a
b2
a
b3
b
c4
b
c5
b
c6
b
b4
b
b5
b
b6
Answers:
I would do it following way
import pandas as pd
df = pd.DataFrame({"order":["a","b"],"container":["c1,c2,c3","c4,c5,c6"],"box":["b1,b2,b3","b4,b5,b6"]})
df2 = pd.concat([df.order,df.container.str.split(",",expand=True),df.box.str.split(",",expand=True)],axis=1)
df2 = df2.melt("order")[["order","value"]]
print(df2)
output
order value
0 a c1
1 b c4
2 a c2
3 b c5
4 a c3
5 b c6
6 a b1
7 b b4
8 a b2
9 b b5
10 a b3
11 b b6
Explanation: use .str.split
with expand=True
to get ,
-sheared values into separate columns, then use .concat
to create DataFrame
from them, then use .melt
at order
to get desired result and then select required columns.
Because you have two columns, you need to join columns with ','
then split and explode like the below:
(df['container'] + "," + df['box']).str.split(',').explode().to_frame().reset_index().rename(columns={'index':'order', 0:'content'})
order content
0 0 c1
1 0 c2
2 0 c3
3 0 b1
4 0 b2
5 0 b3
6 1 c4
7 1 c5
8 1 c6
9 1 b4
10 1 b5
11 1 b6
You can stack
, split
+explode
and convert to DataFrame:
out = (df.set_index('order').stack() # set other columns aside and stack
.str.split(',').explode() # expand values to multiple rows
# cleanup
.reset_index('order', name='content').reset_index(drop=True)
)
print(out)
Output:
order content
0 a c1
1 a c2
2 a c3
3 a b1
4 a b2
5 a b3
6 b c4
7 b c5
8 b c6
9 b b4
10 b b5
11 b b6
Alternative with melt
:
(df.melt('order', value_name='content')
.assign(content=lambda d: d['content'].str.split(','))
.explode('content').drop(columns='variable')
)
You could also use pd.DataFrame.melt
, pd.DataFrame.set_index
, pd.DataFrame.pipe
, pd.DataFrame.sort_index
:
(df
.melt(id_vars='order', value_vars=['container', 'box'])
.set_index('order')
.pipe(lambda x: x['value'].str.split(',').explode())
.sort_index()
)
order
a c1
a c2
a c3
a b1
a b2
a b3
b c4
b c5
b c6
b b4
b b5
b b6
or even a more concise approach::
(df
.melt(id_vars='order', value_vars=['container', 'box'])
.set_index('order')['value'].str.split(',').explode()
.sort_index())
df1.set_index("order").agg(','.join,axis=1).map(lambda x:x.split(",")).explode().rename("content")
out
order
a c1
a c2
a c3
a b1
a b2
a b3
b c4
b c5
b c6
b b4
b b5
b b6
I have Pandas dataset with orders and different types of packaging inside this order separated by comma inside the cell.
order | container | box |
---|---|---|
a | c1,c2,c3 | b1,b2,b3 |
b | c4,c5,c6 | b4,b5,b6 |
Need to get table with two columns: "order" and "content" with all values from both container and box.
I could only merge the container and box – but do not know how to list them row by row.
Needed table is:
order | content |
---|---|
a | c1 |
a | c2 |
a | c3 |
a | b1 |
a | b2 |
a | b3 |
b | c4 |
b | c5 |
b | c6 |
b | b4 |
b | b5 |
b | b6 |
I would do it following way
import pandas as pd
df = pd.DataFrame({"order":["a","b"],"container":["c1,c2,c3","c4,c5,c6"],"box":["b1,b2,b3","b4,b5,b6"]})
df2 = pd.concat([df.order,df.container.str.split(",",expand=True),df.box.str.split(",",expand=True)],axis=1)
df2 = df2.melt("order")[["order","value"]]
print(df2)
output
order value
0 a c1
1 b c4
2 a c2
3 b c5
4 a c3
5 b c6
6 a b1
7 b b4
8 a b2
9 b b5
10 a b3
11 b b6
Explanation: use .str.split
with expand=True
to get ,
-sheared values into separate columns, then use .concat
to create DataFrame
from them, then use .melt
at order
to get desired result and then select required columns.
Because you have two columns, you need to join columns with ','
then split and explode like the below:
(df['container'] + "," + df['box']).str.split(',').explode().to_frame().reset_index().rename(columns={'index':'order', 0:'content'})
order content
0 0 c1
1 0 c2
2 0 c3
3 0 b1
4 0 b2
5 0 b3
6 1 c4
7 1 c5
8 1 c6
9 1 b4
10 1 b5
11 1 b6
You can stack
, split
+explode
and convert to DataFrame:
out = (df.set_index('order').stack() # set other columns aside and stack
.str.split(',').explode() # expand values to multiple rows
# cleanup
.reset_index('order', name='content').reset_index(drop=True)
)
print(out)
Output:
order content
0 a c1
1 a c2
2 a c3
3 a b1
4 a b2
5 a b3
6 b c4
7 b c5
8 b c6
9 b b4
10 b b5
11 b b6
Alternative with melt
:
(df.melt('order', value_name='content')
.assign(content=lambda d: d['content'].str.split(','))
.explode('content').drop(columns='variable')
)
You could also use pd.DataFrame.melt
, pd.DataFrame.set_index
, pd.DataFrame.pipe
, pd.DataFrame.sort_index
:
(df
.melt(id_vars='order', value_vars=['container', 'box'])
.set_index('order')
.pipe(lambda x: x['value'].str.split(',').explode())
.sort_index()
)
order
a c1
a c2
a c3
a b1
a b2
a b3
b c4
b c5
b c6
b b4
b b5
b b6
or even a more concise approach::
(df
.melt(id_vars='order', value_vars=['container', 'box'])
.set_index('order')['value'].str.split(',').explode()
.sort_index())
df1.set_index("order").agg(','.join,axis=1).map(lambda x:x.split(",")).explode().rename("content")
out
order
a c1
a c2
a c3
a b1
a b2
a b3
b c4
b c5
b c6
b b4
b b5
b b6