How to create nested dict with parent children hierachy for streamlit_tree_select from dataframe?

Question:

To use streamlit_tree_select I need to convert a dataframe to its expected structure.

I guess to achieve the goal I could use pandas.groupby('parkey') to group the children, but I’m not sure how to apply this to the appropriate parents while iterating the groups.

The dataframe holding categories:

import pandas as pd

data = [
  {"idnr": 1,"parkey": 0,"descr": "A","info":"string"},
  {"idnr": 2,"parkey": 0,"descr": "B","info":"string"},
  {"idnr": 3,"parkey": 2,"descr": "B B 1","info":"string"},
  {"idnr": 4,"parkey": 3,"descr":"B B B 1","info":"string"},
  {"idnr": 5,"parkey": 3,"descr":"B B B 2","info":"string"}
]

The expected output:

output = [
  {"idnr": 1,"parkey": 0,"descr": "A","info":"string"},
  {"idnr": 2,"parkey": 0,"descr": "B","info":"string","children":[
         {"idnr": 3,"parkey": 2,"descr": "B B 1","info":"string","children":[
            {"idnr": 4,"parkey": 3,"descr":"B B B 1","info":"string"},
            {"idnr": 5,"parkey": 3,"descr":"B B B 2","info":"string"}
         ]}
      ]
  }
]
Asked By: HedgeHog

||

Answers:

One way to do this is to pre-process the data, forming a dict with the children of each of the parents. You can then process the 0 property of this dict, recursively adding children from the dict to the appropriate children array:

def add_child(tree, child):
    key = child['parkey']
    tree[key] = tree.get(key, []) + [child]

parents = dict()
for child in data:
    add_child(parents, child)

Output:

{
 0: [
  {'idnr': 1, 'parkey': 0, 'descr': 'A', 'info': 'string'},
  {'idnr': 2, 'parkey': 0, 'descr': 'B', 'info': 'string'}
 ],
 2: [
  {'idnr': 3, 'parkey': 2, 'descr': 'B B 1', 'info': 'string'}
 ],
 3: [
  {'idnr': 4, 'parkey': 3, 'descr': 'B B B 1', 'info': 'string'},
  {'idnr': 5, 'parkey': 3, 'descr': 'B B B 2', 'info': 'string'}
 ]
}

Now you can iterate the entries in parents[0], recursively adding children as you go:

def add_children(tree, parents):
    for child in tree:
        # any children
        idnr = child['idnr']
        if idnr in parents:
            # add the children
            child['children'] = parents[idnr]
            add_children(child['children'], parents)

output = parents[0]
add_children(output, parents)

Output:

[
  {'idnr': 1, 'parkey': 0, 'descr': 'A', 'info': 'string'},
  {'idnr': 2, 'parkey': 0, 'descr': 'B', 'info': 'string', 'children': [
      {'idnr': 3, 'parkey': 2, 'descr': 'B B 1', 'info': 'string', 'children': [
          {'idnr': 4, 'parkey': 3, 'descr': 'B B B 1', 'info': 'string'},
          {'idnr': 5, 'parkey': 3, 'descr': 'B B B 2', 'info': 'string'}
        ]
      }
    ]
  }
]

Notes:

  1. The add_children routine modifies the data list as it relies on references to work. If you don’t want to that, make a copy of data first or change the add_child code to make copies when assigning child values.
  2. You could combine add_child and add_children, however by splitting the task it means that data does not have to be sorted by parkey.
Answered By: Nick