polars equivalent of pandas set_index() to_dict

Question:

Say i have a polars dataframe similar to this:

import polars as pl
df = pl.DataFrame({'index': [1,2,3,2,1],
                   'object': [1, 1, 1, 2, 2],
                   'period': [1, 2, 4, 4, 23],
                   'value': [24, 67, 89, 5, 23]})


How do I do the following in polars that is easy enough in pandas:
In [2]: df.to_pandas().groupby("index").last().transpose().to_dict()
Out[2]: 
{1: {'object': 2, 'period': 23, 'value': 23},
 2: {'object': 2, 'period': 4, 'value': 5},
 3: {'object': 1, 'period': 4, 'value': 89}}
Asked By: Michael WS

||

Answers:

The Algorithm

Polars does not have the concept of an index. But we can reach the same result by using partition_by.

{
    index: frame.select(pl.exclude('index')).to_dicts()[0]
    for index, frame in
        (
            df
            .unique(subset=['index'], keep='last')
            .partition_by(groups=["index"],
                          as_dict=True,
                          maintain_order=True)
        ).items()
}

{1: {'object': 2, 'period': 23, 'value': 23},
2: {'object': 2, 'period': 4, 'value': 5},
3: {'object': 1, 'period': 4, 'value': 89}}

In steps

The heart of the algorithm is partition_by, with as_dict=True.

(
    df
    .unique(subset=['index'], keep='last')
    .partition_by(groups=["index"],
                  as_dict=True,
                  maintain_order=True)
)
{1: shape: (1, 4)
┌───────┬────────┬────────┬───────┐
│ index ┆ object ┆ period ┆ value │
│ ---   ┆ ---    ┆ ---    ┆ ---   │
│ i64   ┆ i64    ┆ i64    ┆ i64   │
╞═══════╪════════╪════════╪═══════╡
│ 1     ┆ 2      ┆ 23     ┆ 23    │
└───────┴────────┴────────┴───────┘,
2: shape: (1, 4)
┌───────┬────────┬────────┬───────┐
│ index ┆ object ┆ period ┆ value │
│ ---   ┆ ---    ┆ ---    ┆ ---   │
│ i64   ┆ i64    ┆ i64    ┆ i64   │
╞═══════╪════════╪════════╪═══════╡
│ 2     ┆ 2      ┆ 4      ┆ 5     │
└───────┴────────┴────────┴───────┘,
3: shape: (1, 4)
┌───────┬────────┬────────┬───────┐
│ index ┆ object ┆ period ┆ value │
│ ---   ┆ ---    ┆ ---    ┆ ---   │
│ i64   ┆ i64    ┆ i64    ┆ i64   │
╞═══════╪════════╪════════╪═══════╡
│ 3     ┆ 1      ┆ 4      ┆ 89    │
└───────┴────────┴────────┴───────┘}

This creates a dictionary where the keys are the index values, and the values are the one-row sub-dataframes associated with each index.

Using these dictionaries, we can then construct our nested dictionaries using a Python dictionary comprehension as:

{
    index: frame.to_dicts()
    for index, frame in
        (
            df
            .unique(subset=['index'], keep='last')
            .partition_by(groups=["index"],
                          as_dict=True,
                          maintain_order=True)
        ).items()
}
{1: [{'index': 1, 'object': 2, 'period': 23, 'value': 23}],
2: [{'index': 2, 'object': 2, 'period': 4, 'value': 5}],
3: [{'index': 3, 'object': 1, 'period': 4, 'value': 89}]}

All that is left is tidying up the output so that index does not appear in the nested dictionaries, and getting rid of the unneeded list.

{
    index: frame.select(pl.exclude('index')).to_dicts()[0]
    for index, frame in
        (
            df
            .unique(subset=['index'], keep='last')
            .partition_by(groups=["index"],
                          as_dict=True,
                          maintain_order=True)
        ).items()
}
{1: {'object': 2, 'period': 23, 'value': 23},
2: {'object': 2, 'period': 4, 'value': 5},
3: {'object': 1, 'period': 4, 'value': 89}}
Answered By: cbilot

so if we have this dict()

df.to_dict()

def create_dict_from_pls(data_in, idx_key):
    out = {}
    for item in range(len(data_in[idx_key])):
        out[data_in[idx_key][item]] = {}
        for key in data_in:
            out[data_in[idx_key][item]][key] = data_in[key][item]
    return out



In [1]: create_dict_from_pls(out, "index")
Out[1]: 
{1: {'index': 1, 'object': 2, 'period': 23, 'value': 23},
 2: {'index': 2, 'object': 2, 'period': 4, 'value': 5},
 3: {'index': 3, 'object': 1, 'period': 4, 'value': 89}}
Answered By: Michael WS
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.