Pandas – combine column values into a list in a new column

Question:

I have a Python Pandas dataframe df:

d = [['hello', 1, 'GOOD', 'long.kw'],
     [1.2, 'chipotle', np.nan, 'bingo'],
     ['various', np.nan, 3000, 123.456]]
t = pd.DataFrame(data=d, columns=['A','B','C','D']) 

which looks like this:

print(t)
         A         B     C        D
0    hello         1  GOOD  long.kw
1      1.2  chipotle   NaN    bingo
2  various       NaN  3000  123.456

I am trying to create a new column which is a list of the values in A, B, C, and D. So it would look like this:

t['combined']                                             

Out[125]: 
0        [hello, 1, GOOD, long.kw]
1        [1.2, chipotle, nan, bingo]
2        [various, nan, 3000, 123.456]
Name: combined, dtype: object

I am trying this code:

t['combined'] = t.apply(lambda x: list([x['A'],
                                        x['B'],
                                        x['C'],
                                        x['D']]),axis=1)    

Which returns this error:

ValueError: Wrong number of items passed 4, placement implies 1 

What is puzzling to me is if I remove one of the columns that I want to put in the list (or add another column to the dataframe that I DON’T add to the list), my code works.

For instance, run this code:

t['combined'] = t.apply(lambda x: list([x['A'],
                                        x['B'],
                                        x['D']]),axis=1)      

Returns this which is perfect if I only wanted the 3 columns:

print(t)
         A         B     C        D                 combined
0    hello         1  GOOD  long.kw      [hello, 1, long.kw]
1      1.2  chipotle   NaN    bingo   [1.2, chipotle, bingo]
2  various       NaN  3000  123.456  [various, nan, 123.456]
                                    

I am at a complete loss as to why requesting the ‘combined’ list be made of all columns in the dataframe would create an error, but selecting all but 1 column to create the ‘combined’ list works as expected.

Asked By: clg4

||

Answers:

try this :

t['combined']= t.values.tolist()

t
Out[50]: 
         A         B     C        D                       combined
0    hello         1  GOOD  long.kw      [hello, 1, GOOD, long.kw]
1     1.20  chipotle   NaN    bingo    [1.2, chipotle, nan, bingo]
2  various       NaN  3000   123.46  [various, nan, 3000, 123.456]
Answered By: Steven G

If you want to combine column non-NaN values, you’ll need to loop over rows while checking for NaN values. As NaN!=NaN, the fastest check is to check if a value equals itself.

t['combined'] = [[e for e in row if e==e] for row in t.values.tolist()]


         A     B     C        D                     combined
0    hello   1.0  GOOD  long.kw  [hello, 1.0, GOOD, long.kw]
1      1.2  10.0   NaN    bingo           [1.2, 10.0, bingo]  <-- no NaN
2  various   NaN  3000  123.456     [various, 3000, 123.456]  <-- no NaN

A more complete check is to use isnan from the built-in math module.

from math import isnan
t['combined'] = [[e for e in row if not (isinstance(e, float) and isnan(e))] for row in t.values.tolist()]

To combine specific column non-NaN values, select the columns first:

cols = ['A', 'B']
t['combined'] = [[e for e in row if e==e] for row in t[cols].values.tolist()]
Answered By: not a robot

Here is a way with the NaN

t.assign(combined = pd.Series(d))

Output:

         A         B     C        D                       combined
0    hello         1  GOOD  long.kw      [hello, 1, GOOD, long.kw]
1      1.2  chipotle   NaN    bingo    [1.2, chipotle, nan, bingo]
2  various       NaN  3000  123.456  [various, nan, 3000, 123.456]

Here is a way without the NaN

t.assign(combined = t.stack().groupby(level=0).agg(list))

Output:

         A         B     C        D                   combined
0    hello         1  GOOD  long.kw  [hello, 1, GOOD, long.kw]
1      1.2  chipotle   NaN    bingo     [1.2, chipotle, bingo]
2  various       NaN  3000  123.456   [various, 3000, 123.456]
Answered By: rhug123
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.