Find value of column based on another column condition (max) in polars for many columns

Question:

If I have this dataframe:

pl.DataFrame(dict(x=[0, 1, 2, 3], y=[5, 2, 3, 3],z=[4,7,8,2]))
shape: (4, 3)
┌─────┬─────┬─────┐
│ x   ┆ y   ┆ z   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0   ┆ 5   ┆ 4   │
│ 1   ┆ 2   ┆ 7   │
│ 2   ┆ 3   ┆ 8   │
│ 3   ┆ 3   ┆ 2   │
└─────┴─────┴─────┘

and I want to find the value in x where y is max, then again find the value in x where z is max, and repeat for hundreds more columns so that I end up with something like:

shape: (2, 2)
┌────────┬─────────┐
│ column ┆ x_value │
│ ---    ┆ ---     │
│ str    ┆ i64     │
╞════════╪═════════╡
│ y      ┆ 0       │
│ z      ┆ 2       │
└────────┴─────────┘

or

shape: (1, 2)
┌─────┬─────┐
│ y   ┆ z   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 0   ┆ 2   │
└─────┴─────┘

What is the best polars way to do that?

Asked By: pwb2103

||

Answers:

There is a PR to add by to Expr.top_k() which should allow:

y = pl.col("x").top_k(1, by="y")
z = pl.col("x").top_k(1, by="z")

Until then:
you could perform a "wide to long" reshape with .melt()

>>> df.melt("x")
shape: (8, 3)
┌─────┬──────────┬───────┐
│ x   ┆ variable ┆ value │
│ --- ┆ ---      ┆ ---   │
│ i64 ┆ str      ┆ i64   │
╞═════╪══════════╪═══════╡
│ 0   ┆ y        ┆ 5     │
│ 1   ┆ y        ┆ 2     │
│ 2   ┆ y        ┆ 3     │
│ 3   ┆ y        ┆ 3     │
│ 0   ┆ z        ┆ 4     │
│ 1   ┆ z        ┆ 7     │
│ 2   ┆ z        ┆ 8     │
│ 3   ┆ z        ┆ 2     │
└─────┴──────────┴───────┘

Then .filter() out the .peak_max() per each group:

(df.melt("x")
   .filter(
      pl.col("value").peak_max().over("variable")
   )
)
shape: (2, 3)
┌─────┬──────────┬───────┐
│ x   ┆ variable ┆ value │
│ --- ┆ ---      ┆ ---   │
│ i64 ┆ str      ┆ i64   │
╞═════╪══════════╪═══════╡
│ 0   ┆ y        ┆ 5     │
│ 2   ┆ z        ┆ 8     │
└─────┴──────────┴───────┘
Answered By: jqurious
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.