Why is Polars running my "then" function even if the "when" condition is false?

Question:

Given this DataFrame:

d1 = pl.DataFrame({
    'x': ['a', None, 'b'],
    'y': [1.1, 2.2, 3.3]
})
print(d1)

shape: (3, 2)
┌──────┬─────┐
│ x    ┆ y   │
│ ---  ┆ --- │
│ str  ┆ f64 │
╞══════╪═════╡
│ a    ┆ 1.1 │
│ null ┆ 2.2 │
│ b    ┆ 3.3 │
└──────┴─────┘

I want to transform rows with non-null x into some f(x,y)=999, else null.

# Using [print(d), 999][-1] hack just to trace calls
d2 = d1.select(
    pl.when(pl.col('x').is_not_null())
    .then(pl.struct(['x','y']).apply(lambda d: [print(d), 999][-1])) 
    .otherwise(None)
    .alias('s'))

print(d2)

Why this print(d) output despite correct d2? I expected the then to be evaluated only if x.is_not_null.

{'x': 1, 'y': 1.1}
{'x': None, 'y': 2.2}  <<<< why?
{'x': 3, 'y': 3.3}

shape: (3, 1)
┌──────┐
│ s    │
│ ---  │
│ i64  │
╞══════╡
│ 999  │
│ null │
│ 999  │
└──────┘

Why is the print executed even for the (null,2.2) row?

Asked By: Des1303

||

Answers:

As for why this happens, it’s because .when() and .then() branches are executed in parallel, the "masking" is done afterwards.

You can .apply() the result of .when().then()

.otherwise(None) is the default

df.with_columns(apply = 
   pl.when(pl.col('x').is_not_null())
     .then(pl.col("x"))
     .apply(lambda self: [print(f"{self=}"), self][1])
)
self='a'
self='b'
shape: (3, 3)
┌──────┬─────┬───────┐
│ x    ┆ y   ┆ apply │
│ ---  ┆ --- ┆ ---   │
│ str  ┆ f64 ┆ str   │
╞══════╪═════╪═══════╡
│ a    ┆ 1.1 ┆ a     │
│ null ┆ 2.2 ┆ null  │
│ b    ┆ 3.3 ┆ b     │
└──────┴─────┴───────┘

This only prints twice because .apply() skips nulls by default.

It doesn’t appear to be working with a struct though:

df.with_columns(apply = 
   pl.when(pl.col('x').is_not_null())
     .then(pl.struct("x", "y"))
     .apply(lambda self: [print(f"{self=}"), self][1])
)
self={'x': 'a', 'y': 1.1}
self={'x': None, 'y': None}
self={'x': 'b', 'y': 3.3}
shape: (3, 3)
┌──────┬─────┬─────────────┐
│ x    ┆ y   ┆ apply       │
│ ---  ┆ --- ┆ ---         │
│ str  ┆ f64 ┆ struct[2]   │
╞══════╪═════╪═════════════╡
│ a    ┆ 1.1 ┆ {"a",1.1}   │
│ null ┆ 2.2 ┆ {null,null} │
│ b    ┆ 3.3 ┆ {"b",3.3}   │
└──────┴─────┴─────────────┘

Polars does consider a struct of all null values as null:

df.with_columns(apply = 
   pl.when(pl.col('x').is_not_null())
     .then(pl.struct("x", "y"))
     .is_null()
)

shape: (3, 3)
┌──────┬─────┬───────┐
│ x    ┆ y   ┆ apply │
│ ---  ┆ --- ┆ ---   │
│ str  ┆ f64 ┆ bool  │
╞══════╪═════╪═══════╡
│ a    ┆ 1.1 ┆ false │
│ null ┆ 2.2 ┆ true  │
│ b    ┆ 3.3 ┆ false │
└──────┴─────┴───────┘

So I think this could possibly be a "bug".

Answered By: jqurious
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.