How to combine 2 lists of pandas columns?
Question:
Say I have a df like this:
col1 col2
0. [1, 2, 3] [4, 5, 6]
1 [4, 6] [43]
2. [7, 11] NaN
3. None [9, 9]
4. [] [13, 17]
I want to combine the two columns into one combined list. For NaN
and None
values I would like to ignore them.
Expected results:
col1 col2 col3
0. [1, 2, 3] [4, 5, 6] [1, 2, 3, 4, 5, 6]
1 [4, 6] 43 [4, 6, 43]
2. [7, 11] NaN [7, 11]
3. None [9, 9] [9, 9]
4. [] [13, 17] [13, 17]
5. NaN NaN NaN
I’ve looked at numpy and pandas functions and am unable to do this. Help.
Answers:
Solution with remove missing values if not exist in both columns with DataFrame.stack
and aggregation:
df['col3'] = df[['col1','col2']].stack().groupby(level=0).sum()
Or:
df = pd.DataFrame({'col1':[[1, 2, 3], [4, 6], [7, 11], None, [], np.nan],
'col2':[[4, 5, 6], [43], np.nan, [9, 9],[13, 17], np.nan]})
df['col3'] = (df.stack()
.groupby(level=0)
.agg(lambda x: [z for y in x for z in y]))
print (df)
col1 col2 col3
0 [1, 2, 3] [4, 5, 6] [1, 2, 3, 4, 5, 6]
1 [4, 6] [43] [4, 6, 43]
2 [7, 11] NaN [7, 11]
3 None [9, 9] [9, 9]
4 [] [13, 17] [13, 17]
5 NaN NaN NaN
Using numpy and keeping the NaN:
import numpy as np
df['col3'] = [np.r_[x].tolist() for x in zip(df['col1'], df['col2'])]
Output:
col1 col2 col3
0 [1, 2, 3] [4, 5, 6] [1, 2, 3, 4, 5, 6]
1 [4, 6] 43 [4, 6, 43]
2 [7, 11] NaN [7.0, 11.0, nan]
3 None [9, 9] [None, 9, 9]
4 [] [13, 17] [13.0, 17.0]
Say I have a df like this:
col1 col2
0. [1, 2, 3] [4, 5, 6]
1 [4, 6] [43]
2. [7, 11] NaN
3. None [9, 9]
4. [] [13, 17]
I want to combine the two columns into one combined list. For NaN
and None
values I would like to ignore them.
Expected results:
col1 col2 col3
0. [1, 2, 3] [4, 5, 6] [1, 2, 3, 4, 5, 6]
1 [4, 6] 43 [4, 6, 43]
2. [7, 11] NaN [7, 11]
3. None [9, 9] [9, 9]
4. [] [13, 17] [13, 17]
5. NaN NaN NaN
I’ve looked at numpy and pandas functions and am unable to do this. Help.
Solution with remove missing values if not exist in both columns with DataFrame.stack
and aggregation:
df['col3'] = df[['col1','col2']].stack().groupby(level=0).sum()
Or:
df = pd.DataFrame({'col1':[[1, 2, 3], [4, 6], [7, 11], None, [], np.nan],
'col2':[[4, 5, 6], [43], np.nan, [9, 9],[13, 17], np.nan]})
df['col3'] = (df.stack()
.groupby(level=0)
.agg(lambda x: [z for y in x for z in y]))
print (df)
col1 col2 col3
0 [1, 2, 3] [4, 5, 6] [1, 2, 3, 4, 5, 6]
1 [4, 6] [43] [4, 6, 43]
2 [7, 11] NaN [7, 11]
3 None [9, 9] [9, 9]
4 [] [13, 17] [13, 17]
5 NaN NaN NaN
Using numpy and keeping the NaN:
import numpy as np
df['col3'] = [np.r_[x].tolist() for x in zip(df['col1'], df['col2'])]
Output:
col1 col2 col3
0 [1, 2, 3] [4, 5, 6] [1, 2, 3, 4, 5, 6]
1 [4, 6] 43 [4, 6, 43]
2 [7, 11] NaN [7.0, 11.0, nan]
3 None [9, 9] [None, 9, 9]
4 [] [13, 17] [13.0, 17.0]