Conditional replacement polars dataframe in python

Question:

I am experimenting with a polars dataframe. The first column stores strings or null-values, the second numbers or null values. The rest are some columns with non-null data.

I try to replace the null values with a fixed value:

dataframe = dataframe.with_column(pl.when(pl.col("Column1").is_null()).then("String"))
dataframe = dataframe.with_column(pl.when(pl.col("Column2").is_null()).then(0))

I get the error TypeError: with_column expects a single Expr or Series. Consider using with_columns if you need multiple columns., but choosing with_columns() raises ValueError: Expected an expression, got <polars.internals.whenthen.WhenThen object at.

My original idea comes from the related post Conditional assignment in polars dataframe, but I do not see my mistake. What am I missing?

Asked By: Jeremy S.

||

Answers:

I think you’re just missing otherwise?

Adapting the example from the linked question:

In [8]: import pandas as pd
   ...: df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'],
   ...:                 'conference': ['East', 'East', 'East', 'West', 'West', 'East'],
   ...:                 'points': [11, 8, 10, 6, 6, 5],
   ...:                 'rebounds': [7, 7, 6, 9, 12, 8]})
   ...: df = pl.from_pandas(df); df
Out[8]:
shape: (6, 4)
┌──────┬────────────┬────────┬──────────┐
│ team ┆ conference ┆ points ┆ rebounds │
│ ---  ┆ ---        ┆ ---    ┆ ---      │
│ str  ┆ str        ┆ i64    ┆ i64      │
╞══════╪════════════╪════════╪══════════╡
│ A    ┆ East       ┆ 11     ┆ 7        │
│ A    ┆ East       ┆ 8      ┆ 7        │
│ A    ┆ East       ┆ 10     ┆ 6        │
│ B    ┆ West       ┆ 6      ┆ 9        │
│ B    ┆ West       ┆ 6      ┆ 12       │
│ C    ┆ East       ┆ 5      ┆ 8        │
└──────┴────────────┴────────┴──────────┘

In [9]: df.with_column(pl.when(pl.col("team").is_null()).then("String").otherwise(pl.col('team')).alias('new_column'))
Out[9]:
shape: (6, 5)
┌──────┬────────────┬────────┬──────────┬────────────┐
│ team ┆ conference ┆ points ┆ rebounds ┆ new_column │
│ ---  ┆ ---        ┆ ---    ┆ ---      ┆ ---        │
│ str  ┆ str        ┆ i64    ┆ i64      ┆ str        │
╞══════╪════════════╪════════╪══════════╪════════════╡
│ A    ┆ East       ┆ 11     ┆ 7        ┆ A          │
│ A    ┆ East       ┆ 8      ┆ 7        ┆ A          │
│ A    ┆ East       ┆ 10     ┆ 6        ┆ A          │
│ B    ┆ West       ┆ 6      ┆ 9        ┆ B          │
│ B    ┆ West       ┆ 6      ┆ 12       ┆ B          │
│ C    ┆ East       ┆ 5      ┆ 8        ┆ C          │
└──────┴────────────┴────────┴──────────┴────────────┘
Answered By: ignoring_gravity

To replace null values you can use .fill_null():

df.with_columns([
   pl.col("Column1").fill_null("String"),
   pl.col("Column2").fill_null(0)
])
shape: (3, 3)
┌─────────┬─────────┬─────────┐
│ Column1 | Column2 | Column3 │
│ ---     | ---     | ---     │
│ str     | i64     | str     │
╞═════════╪═════════╪═════════╡
│ foo     | 0       | a       │
├─────────┼─────────┼─────────┤
│ String  | 0       | b       │
├─────────┼─────────┼─────────┤
│ bar     | 1       | c       │
└─────────┴─────────┴─────────┘

when/then

.when().then() produces a WhenThen object:

>>> pl.when(pl.col("Column1").is_null()).then("String")
<polars.internals.whenthen.WhenThen at 0x1270e8d90>

The error says .with_column() expects a single Expr or Series.

One way to get an Expr is to set a name using .alias()

>>> pl.when(pl.col("Column1").is_null()).then("String").alias("Column1")
<polars.internals.expr.expr.Expr at 0x12b77c1c0>

There is also .keep_name()

>>> pl.when(pl.col("Column1").is_null()).then("String").keep_name()
<polars.internals.expr.expr.Expr at 0x12ba7e530>

Column1 is the name of the "root expression" in this case.

>>> df.with_column(pl.when(pl.col("Column1").is_null()).then("String").keep_name())
shape: (3, 3)
┌─────────┬─────────┬─────────┐
│ Column1 | Column2 | Column3 │
│ ---     | ---     | ---     │
│ str     | i64     | str     │
╞═════════╪═════════╪═════════╡
│ null    | null    | a       │
├─────────┼─────────┼─────────┤
│ String  | null    | b       │
├─────────┼─────────┼─────────┤
│ null    | 1       | c       │
└─────────┴─────────┴─────────┘

otherwise

If you do not supply an .otherwise() – the default is None which is why you see null values for the False cases.

Supplying .otherwise() also gives you an Expr – you want the original column value in this case:

>>> pl.when(pl.col("Column1").is_null()).then("String").otherwise(pl.col("Column1"))
<polars.internals.expr.expr.Expr at 0x12bf77d90>
>>> df.with_column(pl.when(pl.col("Column1").is_null()).then("String").otherwise(pl.col("Column1")))
shape: (3, 4)
┌─────────┬─────────┬─────────┬─────────┐
│ Column1 | Column2 | Column3 | literal │
│ ---     | ---     | ---     | ---     │
│ str     | i64     | str     | str     │
╞═════════╪═════════╪═════════╪═════════╡
│ foo     | null    | a       | foo     │
├─────────┼─────────┼─────────┼─────────┤
│ null    | null    | b       | String  │
├─────────┼─────────┼─────────┼─────────┤
│ bar     | 1       | c       | bar     │
└─────────┴─────────┴─────────┴─────────┘

This results in a new column named literal.

You can add .alias("Column1") or .keep_name() to "replace" the original column instead.

Answered By: jqurious
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.