Conditional replacement polars dataframe in python
Question:
I am experimenting with a polars dataframe. The first column stores strings or null-values, the second numbers or null values. The rest are some columns with non-null data.
I try to replace the null values with a fixed value:
dataframe = dataframe.with_column(pl.when(pl.col("Column1").is_null()).then("String"))
dataframe = dataframe.with_column(pl.when(pl.col("Column2").is_null()).then(0))
I get the error TypeError: with_column expects a single Expr or Series. Consider using
with_columns if you need multiple columns.
, but choosing with_columns()
raises ValueError: Expected an expression, got <polars.internals.whenthen.WhenThen object at
.
My original idea comes from the related post Conditional assignment in polars dataframe, but I do not see my mistake. What am I missing?
Answers:
I think you’re just missing otherwise
?
Adapting the example from the linked question:
In [8]: import pandas as pd
...: df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'],
...: 'conference': ['East', 'East', 'East', 'West', 'West', 'East'],
...: 'points': [11, 8, 10, 6, 6, 5],
...: 'rebounds': [7, 7, 6, 9, 12, 8]})
...: df = pl.from_pandas(df); df
Out[8]:
shape: (6, 4)
┌──────┬────────────┬────────┬──────────┐
│ team ┆ conference ┆ points ┆ rebounds │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞══════╪════════════╪════════╪══════════╡
│ A ┆ East ┆ 11 ┆ 7 │
│ A ┆ East ┆ 8 ┆ 7 │
│ A ┆ East ┆ 10 ┆ 6 │
│ B ┆ West ┆ 6 ┆ 9 │
│ B ┆ West ┆ 6 ┆ 12 │
│ C ┆ East ┆ 5 ┆ 8 │
└──────┴────────────┴────────┴──────────┘
In [9]: df.with_column(pl.when(pl.col("team").is_null()).then("String").otherwise(pl.col('team')).alias('new_column'))
Out[9]:
shape: (6, 5)
┌──────┬────────────┬────────┬──────────┬────────────┐
│ team ┆ conference ┆ points ┆ rebounds ┆ new_column │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 ┆ str │
╞══════╪════════════╪════════╪══════════╪════════════╡
│ A ┆ East ┆ 11 ┆ 7 ┆ A │
│ A ┆ East ┆ 8 ┆ 7 ┆ A │
│ A ┆ East ┆ 10 ┆ 6 ┆ A │
│ B ┆ West ┆ 6 ┆ 9 ┆ B │
│ B ┆ West ┆ 6 ┆ 12 ┆ B │
│ C ┆ East ┆ 5 ┆ 8 ┆ C │
└──────┴────────────┴────────┴──────────┴────────────┘
To replace null values you can use .fill_null()
:
df.with_columns([
pl.col("Column1").fill_null("String"),
pl.col("Column2").fill_null(0)
])
shape: (3, 3)
┌─────────┬─────────┬─────────┐
│ Column1 | Column2 | Column3 │
│ --- | --- | --- │
│ str | i64 | str │
╞═════════╪═════════╪═════════╡
│ foo | 0 | a │
├─────────┼─────────┼─────────┤
│ String | 0 | b │
├─────────┼─────────┼─────────┤
│ bar | 1 | c │
└─────────┴─────────┴─────────┘
when/then
.when().then()
produces a WhenThen
object:
>>> pl.when(pl.col("Column1").is_null()).then("String")
<polars.internals.whenthen.WhenThen at 0x1270e8d90>
The error says .with_column()
expects a single Expr or Series
.
One way to get an Expr is to set a name using .alias()
>>> pl.when(pl.col("Column1").is_null()).then("String").alias("Column1")
<polars.internals.expr.expr.Expr at 0x12b77c1c0>
There is also .keep_name()
>>> pl.when(pl.col("Column1").is_null()).then("String").keep_name()
<polars.internals.expr.expr.Expr at 0x12ba7e530>
Column1
is the name of the "root expression" in this case.
>>> df.with_column(pl.when(pl.col("Column1").is_null()).then("String").keep_name())
shape: (3, 3)
┌─────────┬─────────┬─────────┐
│ Column1 | Column2 | Column3 │
│ --- | --- | --- │
│ str | i64 | str │
╞═════════╪═════════╪═════════╡
│ null | null | a │
├─────────┼─────────┼─────────┤
│ String | null | b │
├─────────┼─────────┼─────────┤
│ null | 1 | c │
└─────────┴─────────┴─────────┘
otherwise
If you do not supply an .otherwise()
– the default is None
which is why you see null
values for the False
cases.
Supplying .otherwise()
also gives you an Expr – you want the original column value in this case:
>>> pl.when(pl.col("Column1").is_null()).then("String").otherwise(pl.col("Column1"))
<polars.internals.expr.expr.Expr at 0x12bf77d90>
>>> df.with_column(pl.when(pl.col("Column1").is_null()).then("String").otherwise(pl.col("Column1")))
shape: (3, 4)
┌─────────┬─────────┬─────────┬─────────┐
│ Column1 | Column2 | Column3 | literal │
│ --- | --- | --- | --- │
│ str | i64 | str | str │
╞═════════╪═════════╪═════════╪═════════╡
│ foo | null | a | foo │
├─────────┼─────────┼─────────┼─────────┤
│ null | null | b | String │
├─────────┼─────────┼─────────┼─────────┤
│ bar | 1 | c | bar │
└─────────┴─────────┴─────────┴─────────┘
This results in a new column named literal
.
You can add .alias("Column1")
or .keep_name()
to "replace" the original column instead.
I am experimenting with a polars dataframe. The first column stores strings or null-values, the second numbers or null values. The rest are some columns with non-null data.
I try to replace the null values with a fixed value:
dataframe = dataframe.with_column(pl.when(pl.col("Column1").is_null()).then("String"))
dataframe = dataframe.with_column(pl.when(pl.col("Column2").is_null()).then(0))
I get the error TypeError: with_column expects a single Expr or Series. Consider using
with_columns if you need multiple columns.
, but choosing with_columns()
raises ValueError: Expected an expression, got <polars.internals.whenthen.WhenThen object at
.
My original idea comes from the related post Conditional assignment in polars dataframe, but I do not see my mistake. What am I missing?
I think you’re just missing otherwise
?
Adapting the example from the linked question:
In [8]: import pandas as pd
...: df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'],
...: 'conference': ['East', 'East', 'East', 'West', 'West', 'East'],
...: 'points': [11, 8, 10, 6, 6, 5],
...: 'rebounds': [7, 7, 6, 9, 12, 8]})
...: df = pl.from_pandas(df); df
Out[8]:
shape: (6, 4)
┌──────┬────────────┬────────┬──────────┐
│ team ┆ conference ┆ points ┆ rebounds │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞══════╪════════════╪════════╪══════════╡
│ A ┆ East ┆ 11 ┆ 7 │
│ A ┆ East ┆ 8 ┆ 7 │
│ A ┆ East ┆ 10 ┆ 6 │
│ B ┆ West ┆ 6 ┆ 9 │
│ B ┆ West ┆ 6 ┆ 12 │
│ C ┆ East ┆ 5 ┆ 8 │
└──────┴────────────┴────────┴──────────┘
In [9]: df.with_column(pl.when(pl.col("team").is_null()).then("String").otherwise(pl.col('team')).alias('new_column'))
Out[9]:
shape: (6, 5)
┌──────┬────────────┬────────┬──────────┬────────────┐
│ team ┆ conference ┆ points ┆ rebounds ┆ new_column │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 ┆ str │
╞══════╪════════════╪════════╪══════════╪════════════╡
│ A ┆ East ┆ 11 ┆ 7 ┆ A │
│ A ┆ East ┆ 8 ┆ 7 ┆ A │
│ A ┆ East ┆ 10 ┆ 6 ┆ A │
│ B ┆ West ┆ 6 ┆ 9 ┆ B │
│ B ┆ West ┆ 6 ┆ 12 ┆ B │
│ C ┆ East ┆ 5 ┆ 8 ┆ C │
└──────┴────────────┴────────┴──────────┴────────────┘
To replace null values you can use .fill_null()
:
df.with_columns([
pl.col("Column1").fill_null("String"),
pl.col("Column2").fill_null(0)
])
shape: (3, 3)
┌─────────┬─────────┬─────────┐
│ Column1 | Column2 | Column3 │
│ --- | --- | --- │
│ str | i64 | str │
╞═════════╪═════════╪═════════╡
│ foo | 0 | a │
├─────────┼─────────┼─────────┤
│ String | 0 | b │
├─────────┼─────────┼─────────┤
│ bar | 1 | c │
└─────────┴─────────┴─────────┘
when/then
.when().then()
produces a WhenThen
object:
>>> pl.when(pl.col("Column1").is_null()).then("String")
<polars.internals.whenthen.WhenThen at 0x1270e8d90>
The error says .with_column()
expects a single Expr or Series
.
One way to get an Expr is to set a name using .alias()
>>> pl.when(pl.col("Column1").is_null()).then("String").alias("Column1")
<polars.internals.expr.expr.Expr at 0x12b77c1c0>
There is also .keep_name()
>>> pl.when(pl.col("Column1").is_null()).then("String").keep_name()
<polars.internals.expr.expr.Expr at 0x12ba7e530>
Column1
is the name of the "root expression" in this case.
>>> df.with_column(pl.when(pl.col("Column1").is_null()).then("String").keep_name())
shape: (3, 3)
┌─────────┬─────────┬─────────┐
│ Column1 | Column2 | Column3 │
│ --- | --- | --- │
│ str | i64 | str │
╞═════════╪═════════╪═════════╡
│ null | null | a │
├─────────┼─────────┼─────────┤
│ String | null | b │
├─────────┼─────────┼─────────┤
│ null | 1 | c │
└─────────┴─────────┴─────────┘
otherwise
If you do not supply an .otherwise()
– the default is None
which is why you see null
values for the False
cases.
Supplying .otherwise()
also gives you an Expr – you want the original column value in this case:
>>> pl.when(pl.col("Column1").is_null()).then("String").otherwise(pl.col("Column1"))
<polars.internals.expr.expr.Expr at 0x12bf77d90>
>>> df.with_column(pl.when(pl.col("Column1").is_null()).then("String").otherwise(pl.col("Column1")))
shape: (3, 4)
┌─────────┬─────────┬─────────┬─────────┐
│ Column1 | Column2 | Column3 | literal │
│ --- | --- | --- | --- │
│ str | i64 | str | str │
╞═════════╪═════════╪═════════╪═════════╡
│ foo | null | a | foo │
├─────────┼─────────┼─────────┼─────────┤
│ null | null | b | String │
├─────────┼─────────┼─────────┼─────────┤
│ bar | 1 | c | bar │
└─────────┴─────────┴─────────┴─────────┘
This results in a new column named literal
.
You can add .alias("Column1")
or .keep_name()
to "replace" the original column instead.