How to combine non-null entries of columns of a DataFrame into a new column?
Question:
I am trying to create a new column that has a list of all entries of past columns that are non-null.
I would like to be able to be able to produce the desired column without having to iterate through each of the rows.
col1 col2 col3 output
a NaN b [a,b]
c d e [c,d,e]
f g NaN [f,g]
Any help would be greatly appreciated.
Answers:
Use DataFrame.agg
to call dropna
and tolist
:
df.agg(lambda x: x.dropna().tolist(), axis=1)
0 [a, b]
1 [c, d, e]
2 [f, g]
dtype: object
If you need comma separated string instead, use str.cat
or str.join
:
df.agg(lambda x: x.dropna().str.cat(sep=','), axis=1)
# df.agg(lambda x: ','.join(x.dropna()), axis=1)
0 a,b
1 c,d,e
2 f,g
dtype: object
If performance is important, I recommend the use of a list comprehension:
df['output'] = [x[pd.notna(x)].tolist() for x in df.values]
df
col1 col2 col3 output
0 a NaN b [a, b]
1 c d e [c, d, e]
2 f g NaN [f, g]
This works because your DataFrame consists of strings. For more information on when loops are appropriate to use with pandas, see this discussion: For loops with pandas – When should I care?
Using for loop
df['New']=[[y for y in x if y == y ] for x in df.values.tolist()]
df
Out[654]:
col1 col2 col3 New
0 a NaN b [a, b]
1 c d e [c, d, e]
2 f g NaN [f, g]
Or using stack
with groupby
df['New']=df.stack().groupby(level=0).agg(list)
df
Out[659]:
col1 col2 col3 New
0 a NaN b [a, b]
1 c d e [c, d, e]
2 f g NaN [f, g]
Try this:
df['output'] = df.apply(lambda x: x.dropna().to_list(), axis=1)
I am trying to create a new column that has a list of all entries of past columns that are non-null.
I would like to be able to be able to produce the desired column without having to iterate through each of the rows.
col1 col2 col3 output
a NaN b [a,b]
c d e [c,d,e]
f g NaN [f,g]
Any help would be greatly appreciated.
Use DataFrame.agg
to call dropna
and tolist
:
df.agg(lambda x: x.dropna().tolist(), axis=1)
0 [a, b]
1 [c, d, e]
2 [f, g]
dtype: object
If you need comma separated string instead, use str.cat
or str.join
:
df.agg(lambda x: x.dropna().str.cat(sep=','), axis=1)
# df.agg(lambda x: ','.join(x.dropna()), axis=1)
0 a,b
1 c,d,e
2 f,g
dtype: object
If performance is important, I recommend the use of a list comprehension:
df['output'] = [x[pd.notna(x)].tolist() for x in df.values]
df
col1 col2 col3 output
0 a NaN b [a, b]
1 c d e [c, d, e]
2 f g NaN [f, g]
This works because your DataFrame consists of strings. For more information on when loops are appropriate to use with pandas, see this discussion: For loops with pandas – When should I care?
Using for loop
df['New']=[[y for y in x if y == y ] for x in df.values.tolist()]
df
Out[654]:
col1 col2 col3 New
0 a NaN b [a, b]
1 c d e [c, d, e]
2 f g NaN [f, g]
Or using stack
with groupby
df['New']=df.stack().groupby(level=0).agg(list)
df
Out[659]:
col1 col2 col3 New
0 a NaN b [a, b]
1 c d e [c, d, e]
2 f g NaN [f, g]
Try this:
df['output'] = df.apply(lambda x: x.dropna().to_list(), axis=1)