Filling `null` values of a column with another column

Question:

I want to fill the null values of a column with the content of another column of the same row in a lazy data frame in Polars.

Is this possible with reasonable performance?

Asked By: zareami10

||

Answers:

I just found a possible solution:

df.with_column(
    pl.when(pl.col("c").is_null())
    .then(pl.col("b"))
    .otherwise(pl.col("a")).alias("a")
)
Answered By: zareami10

There’s a function for this: fill_null.

Let’s say we have this data:

import polars as pl

df = pl.DataFrame({'a': [1, None, 3, 4],
                   'b': [10, 20, 30, 40]
                   }).lazy()
print(df.collect())
shape: (4, 2)
┌──────┬─────┐
│ a    ┆ b   │
│ ---  ┆ --- │
│ i64  ┆ i64 │
╞══════╪═════╡
│ 1    ┆ 10  │
├╌╌╌╌╌╌┼╌╌╌╌╌┤
│ null ┆ 20  │
├╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 3    ┆ 30  │
├╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 4    ┆ 40  │
└──────┴─────┘

We can fill the null values in column a with values in column b:

df.with_column(pl.col('a').fill_null(pl.col('b'))).collect()
shape: (4, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 10  │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 20  ┆ 20  │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 3   ┆ 30  │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 4   ┆ 40  │
└─────┴─────┘

The performance of this will be quite good.

Answered By: user18559875

df['columnwithnans'].fillna(df["replacingcolumn"], inplace=True)

Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.