How to mask a polars dataframe using another dataframe?

Question:

I have a polars dataframe like so:

pl.DataFrame({
    'time': [datetime(2021, 10, 2, 0, 5), datetime(2021, 10, 2, 0, 10)],
    '1': [2.9048, 48224.0],
    '2': [2.8849, 48068.0]
})

and a masking dataframe with similar columns and time value like so:

pl.DataFrame({
    'time': [datetime(2021, 10, 2, 0, 5), datetime(2021, 10, 2, 0, 10)],
    '1': [False, True],
    '2': [False, True]
})

I am looking for this result:

pl.DataFrame({
    'time': [datetime(2021, 10, 2, 0, 5), datetime(2021, 10, 2, 0, 10)],
    '1': [None, 2.8849],
    '2': [None, 48068.0],
})

Here I only show with 2 columns ‘1’ and ‘2’ but there could any number of them.

Any help would be appreciated!

Asked By: clem

||

Answers:

Having columns in a single DataFrame has guarantees you don’t have when you have data in separate tables.

Masking out values by columns in another DataFrame is a potential for errors caused by different lengths. For this reason polars does not encourage such operations and therefore you must first create a single DataFrame from the two, and then select the columns/computations you need.

So let’s do that.

The first thing you can do is join the two tables. This guaranteed to work on DataFrames of different sizes.

df_a.join(df_mask, on="time", suffix="_mask")

This comes however with a price, as joining is not free.

If you are 100% certain your dataframes have the same height, you can use a horizontal concat.

(
    pl.concat(
        [df_a, df_mask.select(pl.all().suffix("_mask"))], how="horizontal"
    ).select(
        [pl.col("time")]
        + [
            pl.when(pl.col(f"{name}_mask")).then(pl.col(name)).otherwise(None)
            for name in ["1", "2"]
        ]
    )
)

In the final select query we take the columns we want. And compute the masked values with a when -> then -> otherwise branch.

This outputs:

shape: (2, 3)
┌─────────────────────┬─────────┬─────────┐
│ time                ┆ 1       ┆ 2       │
│ ---                 ┆ ---     ┆ ---     │
│ datetime[μs]        ┆ f64     ┆ f64     │
╞═════════════════════╪═════════╪═════════╡
│ 2021-10-02 00:05:00 ┆ null    ┆ null    │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2021-10-02 00:10:00 ┆ 48224.0 ┆ 48068.0 │
└─────────────────────┴─────────┴─────────┘

Answered By: ritchie46
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.