How do I concatenate columns values (all but one) to a list and add it as a column with polars?

Question:

I have the input in this format:

import polars as pl

data = {"Name": ['Name_A', 'Name_B','Name_C'], "val_1": ['a',None, 'a'],"val_2": [None,None, 'b'],"val_3": [None,'c', None],"val_4": ['c',None, 'g'],"val_5": [None,None, 'i']}
df = pl.DataFrame(data)
print(df)

shape: (3, 6)
┌────────┬───────┬───────┬───────┬───────┬───────┐
│ Name   ┆ val_1 ┆ val_2 ┆ val_3 ┆ val_4 ┆ val_5 │
│ ---    ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---   │
│ str    ┆ str   ┆ str   ┆ str   ┆ str   ┆ str   │
╞════════╪═══════╪═══════╪═══════╪═══════╪═══════╡
│ Name_A ┆ a     ┆ null  ┆ null  ┆ c     ┆ null  │
│ Name_B ┆ null  ┆ null  ┆ c     ┆ null  ┆ null  │
│ Name_C ┆ a     ┆ b     ┆ null  ┆ g     ┆ i     │
└────────┴───────┴───────┴───────┴───────┴───────┘

I want the output as:

shape: (3, 7)
┌────────┬───────┬───────┬───────┬───────┬───────┬───────────────────┐
│ Name   ┆ val_1 ┆ val_2 ┆ val_3 ┆ val_4 ┆ val_5 ┆ combined          │
│ ---    ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---               │
│ str    ┆ str   ┆ str   ┆ str   ┆ str   ┆ str   ┆ list[str]         │
╞════════╪═══════╪═══════╪═══════╪═══════╪═══════╪═══════════════════╡
│ Name_A ┆ a     ┆ null  ┆ null  ┆ c     ┆ null  ┆ ["a", "c"]        │
│ Name_B ┆ null  ┆ null  ┆ c     ┆ null  ┆ null  ┆ ["c"]             │
│ Name_C ┆ a     ┆ b     ┆ null  ┆ g     ┆ i     ┆ ["a", "b","g""i"] │
└────────┴───────┴───────┴───────┴───────┴───────┴───────────────────┘

I want to combine all the columns as a list except the Name column. I have simplified the data for this question but in reality we have many columns of the val_N format and a generic code where I do not have to list each column name would be great.

Asked By: Shankze

||

Answers:

For the main answer in the question you can do

df.with_columns(combined = pl.concat_list(pl.exclude('Name')))

pl.exclude is how to get all columns BUT the ones given.

To get rid of the nulls in the final list, version 0.19.4 just introduced list.drop_nulls.

df.with_columns(combined = pl.concat_list(pl.exclude('Name')).list.drop_nulls())
Answered By: Wayoshi
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.