Polars calculating mean of sum of multiple columns

Question:

I want to calculate mean of sum of 3 columns based on the aggregate value. It is simple in pandas

df.groupby("id").apply(
        lambda s: pd.Series(
            {
                "m_1 sum": s["m_1"].sum(),
                "m_7_8 mean": (s["m_7"] + s["m_8"]).mean(),
           
            }
        )
    )

cant find a good way in polars as pl.mean needs a string column name …

I expect ability to calculate mean of sum of 2 columns in one line …

Asked By: andy8203

||

Answers:

You can do:

(
    df.groupby("id").agg(
        [
            pl.col("m_1").sum().alias("m_1 sum"),
            (pl.col("m_7") + pl.col("m_8")).mean().alias("m_7_8 mean"),
        ]
    )
)

E.g.:

In [33]: df = pl.DataFrame({'id': [1, 1, 2], 'm_1': [1, 3.5, 2], 'm_7': [5, 1., 2], 'm_8': [6., 5., 4.]})

In [34]: (
    ...:     df.groupby("id").agg(
    ...:         [
    ...:             pl.col("m_1").sum().alias("m_1 sum"),
    ...:             (pl.col("m_7") + pl.col("m_8")).mean().alias("m_7_8 mean"),
    ...:         ]
    ...:     )
    ...: )
Out[34]:
shape: (2, 3)
┌─────┬─────────┬────────────┐
│ id  ┆ m_1 sum ┆ m_7_8 mean │
│ --- ┆ ---     ┆ ---        │
│ i64 ┆ f64     ┆ f64        │
╞═════╪═════════╪════════════╡
│ 2   ┆ 2.0     ┆ 6.0        │
│ 1   ┆ 4.5     ┆ 8.5        │
└─────┴─────────┴────────────┘
Answered By: ignoring_gravity
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.