Polars equivalent of pandas expression df.groupby['col1','col2']['col3'].sum().unstack()

Question

pandasdf=pd.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
        "optional": [28, 300, None, 2, -30],
    }
)
pandasdf.groupby(["fruits","cars"])['B'].sum().unstack()

How can i create an equivalent truth table in polars?

Something like the below table into a truth table

df=pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
        "optional": [28, 300, None, 2, -30],
    }
)
df.groupby(["fruits","cars"]).agg(pl.col('B').sum()) #->truthtable

The efficiency of the code is important as the dataset is too large (for using it with apriori algorithm)

The unstack function in polars is different, polars alterative for pd.crosstab would also work.

Asked By: gkbaby

||

Source

Answer 1

It seems like you want tot do a pivot.

df = pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
        "optional": [28, 300, None, 2, -30],
    }
)

df.pivot(values="B", index="cars", columns="fruits", aggregate_fn=pl.element().sum())

shape: (2, 3)
┌────────┬────────┬───────┐
│ cars   ┆ banana ┆ apple │
│ ---    ┆ ---    ┆ ---   │
│ str    ┆ i64    ┆ i64   │
╞════════╪════════╪═══════╡
│ beetle ┆ 6      ┆ 5     │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ audi   ┆ 4      ┆ null  │
└────────┴────────┴───────┘

Answered By: ritchie46

Polars equivalent of pandas expression df.groupby['col1','col2']['col3'].sum().unstack()

Question:

Answers: