How to mask a polars dataframe using another dataframe?
Question:
I have a polars dataframe like so:
pl.DataFrame({
'time': [datetime(2021, 10, 2, 0, 5), datetime(2021, 10, 2, 0, 10)],
'1': [2.9048, 48224.0],
'2': [2.8849, 48068.0]
})
and a masking dataframe with similar columns and time value like so:
pl.DataFrame({
'time': [datetime(2021, 10, 2, 0, 5), datetime(2021, 10, 2, 0, 10)],
'1': [False, True],
'2': [False, True]
})
I am looking for this result:
pl.DataFrame({
'time': [datetime(2021, 10, 2, 0, 5), datetime(2021, 10, 2, 0, 10)],
'1': [None, 2.8849],
'2': [None, 48068.0],
})
Here I only show with 2 columns ‘1’ and ‘2’ but there could any number of them.
Any help would be appreciated!
Answers:
Having columns in a single DataFrame
has guarantees you don’t have when you have data in separate tables.
Masking out values by columns in another DataFrame
is a potential for errors caused by different lengths. For this reason polars does not encourage such operations and therefore you must first create a single DataFrame
from the two, and then select
the columns/computations you need.
So let’s do that.
The first thing you can do is join
the two tables. This guaranteed to work on DataFrame
s of different sizes.
df_a.join(df_mask, on="time", suffix="_mask")
This comes however with a price, as joining is not free.
If you are 100% certain your dataframes have the same height, you can use a horizontal concat
.
(
pl.concat(
[df_a, df_mask.select(pl.all().suffix("_mask"))], how="horizontal"
).select(
[pl.col("time")]
+ [
pl.when(pl.col(f"{name}_mask")).then(pl.col(name)).otherwise(None)
for name in ["1", "2"]
]
)
)
In the final select
query we take the columns we want. And compute the masked values with a when -> then -> otherwise
branch.
This outputs:
shape: (2, 3)
┌─────────────────────┬─────────┬─────────┐
│ time ┆ 1 ┆ 2 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ f64 │
╞═════════════════════╪═════════╪═════════╡
│ 2021-10-02 00:05:00 ┆ null ┆ null │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2021-10-02 00:10:00 ┆ 48224.0 ┆ 48068.0 │
└─────────────────────┴─────────┴─────────┘
I have a polars dataframe like so:
pl.DataFrame({
'time': [datetime(2021, 10, 2, 0, 5), datetime(2021, 10, 2, 0, 10)],
'1': [2.9048, 48224.0],
'2': [2.8849, 48068.0]
})
and a masking dataframe with similar columns and time value like so:
pl.DataFrame({
'time': [datetime(2021, 10, 2, 0, 5), datetime(2021, 10, 2, 0, 10)],
'1': [False, True],
'2': [False, True]
})
I am looking for this result:
pl.DataFrame({
'time': [datetime(2021, 10, 2, 0, 5), datetime(2021, 10, 2, 0, 10)],
'1': [None, 2.8849],
'2': [None, 48068.0],
})
Here I only show with 2 columns ‘1’ and ‘2’ but there could any number of them.
Any help would be appreciated!
Having columns in a single DataFrame
has guarantees you don’t have when you have data in separate tables.
Masking out values by columns in another DataFrame
is a potential for errors caused by different lengths. For this reason polars does not encourage such operations and therefore you must first create a single DataFrame
from the two, and then select
the columns/computations you need.
So let’s do that.
The first thing you can do is join
the two tables. This guaranteed to work on DataFrame
s of different sizes.
df_a.join(df_mask, on="time", suffix="_mask")
This comes however with a price, as joining is not free.
If you are 100% certain your dataframes have the same height, you can use a horizontal concat
.
(
pl.concat(
[df_a, df_mask.select(pl.all().suffix("_mask"))], how="horizontal"
).select(
[pl.col("time")]
+ [
pl.when(pl.col(f"{name}_mask")).then(pl.col(name)).otherwise(None)
for name in ["1", "2"]
]
)
)
In the final select
query we take the columns we want. And compute the masked values with a when -> then -> otherwise
branch.
This outputs:
shape: (2, 3)
┌─────────────────────┬─────────┬─────────┐
│ time ┆ 1 ┆ 2 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ f64 │
╞═════════════════════╪═════════╪═════════╡
│ 2021-10-02 00:05:00 ┆ null ┆ null │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2021-10-02 00:10:00 ┆ 48224.0 ┆ 48068.0 │
└─────────────────────┴─────────┴─────────┘