Join two columns in Pandas, even both of them are null

Question:

I have dataset with routing list:

| order | point | city | boxes | pallets |
|--| -- | -- | -- | -- |
| o12345 | 1 | X |b0|p0,p1|
|o12345|2|Y|-|p2,p3,p4|
|o12345|3|Z|b1|-|
|o34567|1|Q|-|-|
|o34567|2|W|b2,b3|p5,p6|
|o34567|3|E|-|p7|
|o34567|4|R|b4,b5|p8,p9,p10|

enter image description here

How to join the columns "boxes" and "pallets" to get "cargo" with list both of boxes and pallets inside and the to explode this column to get each value in separate row

import pandas as pd
df=pd.read_excel('example.xlsx')
df['cargo'] = df['pallets']+','+ df['boxes']
print(df)

But not works with null values:(

Firstly expect to get:
enter image description here

And then to explode only for cargo:
enter image description here

Asked By: bluekit46

||

Answers:

Here is an approach using df.explode()

df['cargo'] = (df[['boxes', 'pallets']]
                .apply(lambda x: ','.join([i for i in x if i]), axis=1))
df = df.drop(['boxes', 'pallets'], axis=1)
print(df)

    order  point city            cargo
0  o12345      1    X         b0,p0,p1
1  o12345      2    Y         p2,p3,p4
2  o12345      3    Z               b1
3  o34567      1    Q                 
4  o34567      2    W      b2,b3,p5,p6
5  o34567      3    E               p7
6  o34567      4    R  b4,b5,p8,p9,p10


df['cargo'] = df['cargo'].str.split(',')
df = (df.explode('cargo').sort_values(by=['order', 'point']))
print(df)

    order  point city cargo
0  o12345      1    X    b0
0  o12345      1    X    p0
0  o12345      1    X    p1
1  o12345      2    Y    p2
1  o12345      2    Y    p3
1  o12345      2    Y    p4
2  o12345      3    Z    b1
3  o34567      1    Q      
4  o34567      2    W    b2
4  o34567      2    W    b3
4  o34567      2    W    p5
4  o34567      2    W    p6
5  o34567      3    E    p7
6  o34567      4    R    b4
6  o34567      4    R    b5
6  o34567      4    R    p8
6  o34567      4    R    p9
6  o34567      4    R   p10
Answered By: Jamiu S.
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.