Weighted sum of a column in Polars dataframe

Question:

I have a Polars dataframe and I want to calculate a weighted sum of a particular column and the weights is just the positive integer sequence, e.g., 1, 2, 3, …

For example, assume I have the following dataframe.

import polars as pl

df = pl.DataFrame({"a": [2, 4, 2, 1, 2, 1, 3, 6, 7, 5]})

The result I want is

218 (= 2*1 + 4*2 + 2*3 + 1*4 + ... + 7*9 + 5*10)

How can I achieve this by using only general polars expressions? (The reason I want to use just polars expressions to solve the problem is for speed considerations)

Note: The example is just a simple example where there are just 10 numbers there, but in general, the dataframe height can be any positive number.

Thanks for your help..

Asked By: Keptain

||

Answers:

Such weighted sum can be calculated using dot product (.dot() method). To generate range (weights) from 1 to n, you can use pl.arange(1, n+1).

If you just need to calculate result of weighted sum:

df.select(
    pl.col("a").dot(pl.arange(1, pl.count()+1))
) #.item() - to get value (218)

Keep dataframe

df.with_columns(
    pl.col("a").dot(pl.arange(1, pl.count()+1)).alias("weighted_sum")
)
┌─────┬──────────────┐
│ a   ┆ weighted_sum │
│ --- ┆ ---          │
│ i64 ┆ i64          │
╞═════╪══════════════╡
│ 2   ┆ 218          │
│ 4   ┆ 218          │
│ ... ┆ ...          │
│ 3   ┆ 218          │
│ 5   ┆ 218          │
└─────┴──────────────┘

In groupby context

df.groupby("some_cat_col", maintain_order=True).agg([
    pl.col("a").dot(pl.arange(1, pl.count()+1))
])
Answered By: glebcom

You should be able to dot the series a with the index + 1

import polars as pl

df = pl.DataFrame({"a": [2, 4, 2, 1, 2, 1, 3, 6, 7, 5]})

print(df["a"].dot(df.index+1))

Alternatively, you can use the __matmul__ operator @

print(df["a"] @ (df.index + 1))
Answered By: georgwalker45
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.