Join two columns in Pandas, even both of them are null
Question:
I have dataset with routing list:
| order | point | city | boxes | pallets |
|--| -- | -- | -- | -- |
| o12345 | 1 | X |b0|p0,p1|
|o12345|2|Y|-|p2,p3,p4|
|o12345|3|Z|b1|-|
|o34567|1|Q|-|-|
|o34567|2|W|b2,b3|p5,p6|
|o34567|3|E|-|p7|
|o34567|4|R|b4,b5|p8,p9,p10|
How to join the columns "boxes" and "pallets" to get "cargo" with list both of boxes and pallets inside and the to explode this column to get each value in separate row
import pandas as pd
df=pd.read_excel('example.xlsx')
df['cargo'] = df['pallets']+','+ df['boxes']
print(df)
But not works with null values:(
Answers:
Here is an approach using df.explode()
df['cargo'] = (df[['boxes', 'pallets']]
.apply(lambda x: ','.join([i for i in x if i]), axis=1))
df = df.drop(['boxes', 'pallets'], axis=1)
print(df)
order point city cargo
0 o12345 1 X b0,p0,p1
1 o12345 2 Y p2,p3,p4
2 o12345 3 Z b1
3 o34567 1 Q
4 o34567 2 W b2,b3,p5,p6
5 o34567 3 E p7
6 o34567 4 R b4,b5,p8,p9,p10
df['cargo'] = df['cargo'].str.split(',')
df = (df.explode('cargo').sort_values(by=['order', 'point']))
print(df)
order point city cargo
0 o12345 1 X b0
0 o12345 1 X p0
0 o12345 1 X p1
1 o12345 2 Y p2
1 o12345 2 Y p3
1 o12345 2 Y p4
2 o12345 3 Z b1
3 o34567 1 Q
4 o34567 2 W b2
4 o34567 2 W b3
4 o34567 2 W p5
4 o34567 2 W p6
5 o34567 3 E p7
6 o34567 4 R b4
6 o34567 4 R b5
6 o34567 4 R p8
6 o34567 4 R p9
6 o34567 4 R p10
I have dataset with routing list:
| order | point | city | boxes | pallets |
|--| -- | -- | -- | -- |
| o12345 | 1 | X |b0|p0,p1|
|o12345|2|Y|-|p2,p3,p4|
|o12345|3|Z|b1|-|
|o34567|1|Q|-|-|
|o34567|2|W|b2,b3|p5,p6|
|o34567|3|E|-|p7|
|o34567|4|R|b4,b5|p8,p9,p10|
How to join the columns "boxes" and "pallets" to get "cargo" with list both of boxes and pallets inside and the to explode this column to get each value in separate row
import pandas as pd
df=pd.read_excel('example.xlsx')
df['cargo'] = df['pallets']+','+ df['boxes']
print(df)
But not works with null values:(
Here is an approach using df.explode()
df['cargo'] = (df[['boxes', 'pallets']]
.apply(lambda x: ','.join([i for i in x if i]), axis=1))
df = df.drop(['boxes', 'pallets'], axis=1)
print(df)
order point city cargo
0 o12345 1 X b0,p0,p1
1 o12345 2 Y p2,p3,p4
2 o12345 3 Z b1
3 o34567 1 Q
4 o34567 2 W b2,b3,p5,p6
5 o34567 3 E p7
6 o34567 4 R b4,b5,p8,p9,p10
df['cargo'] = df['cargo'].str.split(',')
df = (df.explode('cargo').sort_values(by=['order', 'point']))
print(df)
order point city cargo
0 o12345 1 X b0
0 o12345 1 X p0
0 o12345 1 X p1
1 o12345 2 Y p2
1 o12345 2 Y p3
1 o12345 2 Y p4
2 o12345 3 Z b1
3 o34567 1 Q
4 o34567 2 W b2
4 o34567 2 W b3
4 o34567 2 W p5
4 o34567 2 W p6
5 o34567 3 E p7
6 o34567 4 R b4
6 o34567 4 R b5
6 o34567 4 R p8
6 o34567 4 R p9
6 o34567 4 R p10