Polars: What is the fastest way to loop over Polars DataFrame columns to apply transformations?

Question:

Is there a preferred way to loop and apply functions to Polars columns?

Here is a pandas example of what I am trying to do:

    df1 = pl.DataFrame(
        {
            "A": np.random.rand(10),
            "B": np.random.rand(10),
            "C": np.random.rand(10)
        }
    )
    df2 = pl.DataFrame(
        {
            "X1": np.random.rand(10),
            "X2": np.random.rand(10),
            "X3": np.random.rand(10)
        }
    )


    # pandas code
    
    # this is just a weighted sum of df2, where the weights are from df1
    df1.to_pandas().apply(
        lambda weights: df2.to_pandas().mul(weights, axis=0).sum() / weights.sum(), axis=0,
        result_type='expand'
    )
Asked By: Sharma

||

Answers:

It looks like you’re doing the equivalent of:

pd.concat(
   (df1.mul(b[col], axis=0).sum() / df1.sum() for col in df2.columns),
   axis=1
)

Which can be written pretty much the same way in polars:

pl.concat((df1 * col).sum() / df1.sum() for col in df2)
shape: (3, 3)
┌──────────┬──────────┬──────────┐
│ A        | B        | C        │
│ ---      | ---      | ---      │
│ f64      | f64      | f64      │
╞══════════╪══════════╪══════════╡
│ 0.363931 | 0.44298  | 0.431432 │
│ 0.54025  | 0.42028  | 0.418826 │
│ 0.506882 | 0.576332 | 0.61857  │
└──────────┴──────────┴──────────┘

It may make sense to do weights = df1.sum() to avoid recalculating the sum each time.

You could also do it inside a .select()

df1.select(
   weighted_sum = 
      pl.concat_list(
         (pl.all() * col).sum() / pl.all().sum()
         for col in df2)
      .reshape((len(df2.columns), -1))
      .arr.to_struct()
      .struct.rename_fields(df1.columns)
).unnest("weighted_sum")
shape: (3, 3)
┌──────────┬──────────┬──────────┐
│ A        | B        | C        │
│ ---      | ---      | ---      │
│ f64      | f64      | f64      │
╞══════════╪══════════╪══════════╡
│ 0.363931 | 0.44298  | 0.431432 │
│ 0.54025  | 0.42028  | 0.418826 │
│ 0.506882 | 0.576332 | 0.61857  │
└──────────┴──────────┴──────────┘
Answered By: jqurious
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.