Polars Conditional Replacement From Another DataFrame

Question:

I have two DataFrames like this.

df1 = pl.DataFrame({
  "col_1": np.random.rand(),
  "col_2": np.random.rand(),
  "col_3": np.random.rand()
})
┌──────────┬─────────┬──────────┐
│ col_1    ┆ col_2   ┆ col_3    │
│ ---      ┆ ---     ┆ ---      │
│ f64      ┆ f64     ┆ f64      │
╞══════════╪═════════╪══════════╡
│ 0.534349 ┆ 0.84115 ┆ 0.526435 │
└──────────┴─────────┴──────────┘
df2 = pl.DataFrame({
    "col_1": np.random.randint(0, 2, 5),
    "col_2": np.random.randint(0, 2, 5),
    "col_3": np.random.randint(0, 2, 5)
})
┌───────┬───────┬───────┐
│ col_1 ┆ col_2 ┆ col_3 │
│ ---   ┆ ---   ┆ ---   │
│ i64   ┆ i64   ┆ i64   │
╞═══════╪═══════╪═══════╡
│ 0     ┆ 0     ┆ 0     │
│ 0     ┆ 1     ┆ 0     │
│ 1     ┆ 1     ┆ 1     │
│ 1     ┆ 1     ┆ 0     │
│ 1     ┆ 1     ┆ 1     │
└───────┴───────┴───────┘

I want to replace the 1s in the second DataFrame with the corresponding value in the 2nd DataFrame. And the zeros should be replaced with 1s. Resulting in this:

┌──────────┬─────────┬──────────┐
│ col_1    ┆ col_2   ┆ col_3    │
│ ---      ┆ ---     ┆ ---      │
│ f64      ┆ f64     ┆ f64      │
╞══════════╪═════════╪══════════╡
│ 1.0      ┆ 1.0     ┆ 1.0      │
│ 1.0      ┆ 0.84115 ┆ 1.0      │
│ 0.534349 ┆ 0.84115 ┆ 0.526435 │
│ 0.534349 ┆ 0.84115 ┆ 1.0      │
│ 0.534349 ┆ 0.84115 ┆ 0.526435 │
└──────────┴─────────┴──────────┘

I tried reshaping df1 to have the same height as df2, like this:

df1 = df1.select(pl.all().repeat_by(df2.height).arr.explode())

And if I rename the columns so they’re not the same, I could horizontally concatenate the 2 DataFrames using pl.concat. But I’m unsure where to go from there. How could I achieve this? Or is there a better approach?

Asked By: bkw1491

||

Answers:

Perhaps a use-case for .map_dict:

df2.select(
   pl.col(col).map_dict({0: 1, 1: df1.get_column(col).item()}) 
   for col in df2.columns
)
shape: (5, 3)
┌──────────┬─────────┬──────────┐
│ col_1    ┆ col_2   ┆ col_3    │
│ ---      ┆ ---     ┆ ---      │
│ f64      ┆ f64     ┆ f64      │
╞══════════╪═════════╪══════════╡
│ 1.0      ┆ 1.0     ┆ 1.0      │
│ 1.0      ┆ 0.84115 ┆ 1.0      │
│ 0.534349 ┆ 0.84115 ┆ 0.526435 │
│ 0.534349 ┆ 0.84115 ┆ 1.0      │
│ 0.534349 ┆ 0.84115 ┆ 0.526435 │
└──────────┴─────────┴──────────┘

The previous approach was to add a suffix and use .with_context

(df2
 .lazy()
 .with_context(
    df1.lazy().with_columns(pl.all().suffix("_right")))
 .select(
    pl.when(pl.col(col) == 1)
      .then(pl.col(f"{col}_right"))
      .otherwise(1)
      .alias(col)
    for col in df2.columns)
).collect()

Answered By: jqurious
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.