How do I concatenate columns values (all but one) to a list and add it as a column with polars?
Question:
I have the input in this format:
import polars as pl
data = {"Name": ['Name_A', 'Name_B','Name_C'], "val_1": ['a',None, 'a'],"val_2": [None,None, 'b'],"val_3": [None,'c', None],"val_4": ['c',None, 'g'],"val_5": [None,None, 'i']}
df = pl.DataFrame(data)
print(df)
shape: (3, 6)
┌────────┬───────┬───────┬───────┬───────┬───────┐
│ Name ┆ val_1 ┆ val_2 ┆ val_3 ┆ val_4 ┆ val_5 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str ┆ str │
╞════════╪═══════╪═══════╪═══════╪═══════╪═══════╡
│ Name_A ┆ a ┆ null ┆ null ┆ c ┆ null │
│ Name_B ┆ null ┆ null ┆ c ┆ null ┆ null │
│ Name_C ┆ a ┆ b ┆ null ┆ g ┆ i │
└────────┴───────┴───────┴───────┴───────┴───────┘
I want the output as:
shape: (3, 7)
┌────────┬───────┬───────┬───────┬───────┬───────┬───────────────────┐
│ Name ┆ val_1 ┆ val_2 ┆ val_3 ┆ val_4 ┆ val_5 ┆ combined │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ list[str] │
╞════════╪═══════╪═══════╪═══════╪═══════╪═══════╪═══════════════════╡
│ Name_A ┆ a ┆ null ┆ null ┆ c ┆ null ┆ ["a", "c"] │
│ Name_B ┆ null ┆ null ┆ c ┆ null ┆ null ┆ ["c"] │
│ Name_C ┆ a ┆ b ┆ null ┆ g ┆ i ┆ ["a", "b","g""i"] │
└────────┴───────┴───────┴───────┴───────┴───────┴───────────────────┘
I want to combine all the columns as a list except the Name column. I have simplified the data for this question but in reality we have many columns of the val_N format and a generic code where I do not have to list each column name would be great.
Answers:
For the main answer in the question you can do
df.with_columns(combined = pl.concat_list(pl.exclude('Name')))
pl.exclude
is how to get all columns BUT the ones given.
To get rid of the nulls
in the final list, version 0.19.4 just introduced list.drop_nulls
.
df.with_columns(combined = pl.concat_list(pl.exclude('Name')).list.drop_nulls())
I have the input in this format:
import polars as pl
data = {"Name": ['Name_A', 'Name_B','Name_C'], "val_1": ['a',None, 'a'],"val_2": [None,None, 'b'],"val_3": [None,'c', None],"val_4": ['c',None, 'g'],"val_5": [None,None, 'i']}
df = pl.DataFrame(data)
print(df)
shape: (3, 6)
┌────────┬───────┬───────┬───────┬───────┬───────┐
│ Name ┆ val_1 ┆ val_2 ┆ val_3 ┆ val_4 ┆ val_5 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str ┆ str │
╞════════╪═══════╪═══════╪═══════╪═══════╪═══════╡
│ Name_A ┆ a ┆ null ┆ null ┆ c ┆ null │
│ Name_B ┆ null ┆ null ┆ c ┆ null ┆ null │
│ Name_C ┆ a ┆ b ┆ null ┆ g ┆ i │
└────────┴───────┴───────┴───────┴───────┴───────┘
I want the output as:
shape: (3, 7)
┌────────┬───────┬───────┬───────┬───────┬───────┬───────────────────┐
│ Name ┆ val_1 ┆ val_2 ┆ val_3 ┆ val_4 ┆ val_5 ┆ combined │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ list[str] │
╞════════╪═══════╪═══════╪═══════╪═══════╪═══════╪═══════════════════╡
│ Name_A ┆ a ┆ null ┆ null ┆ c ┆ null ┆ ["a", "c"] │
│ Name_B ┆ null ┆ null ┆ c ┆ null ┆ null ┆ ["c"] │
│ Name_C ┆ a ┆ b ┆ null ┆ g ┆ i ┆ ["a", "b","g""i"] │
└────────┴───────┴───────┴───────┴───────┴───────┴───────────────────┘
I want to combine all the columns as a list except the Name column. I have simplified the data for this question but in reality we have many columns of the val_N format and a generic code where I do not have to list each column name would be great.
For the main answer in the question you can do
df.with_columns(combined = pl.concat_list(pl.exclude('Name')))
pl.exclude
is how to get all columns BUT the ones given.
To get rid of the nulls
in the final list, version 0.19.4 just introduced list.drop_nulls
.
df.with_columns(combined = pl.concat_list(pl.exclude('Name')).list.drop_nulls())