Building Tree Structure from a list of string paths

Question:

I have a list of paths as in

paths = ["x1/x2", "x1/x2/x3", "x1/x4", "x1/x5/x6", ...]

where the actual length of the list if roughly 20,000. I want to construct a tree structure that can be printed. The tree structure would look something like this:

x1
├── x2
│   └── x3
├── x4
└── x5
    └── x6

I also want to have some data associated to each node in the Node Object that can be currently accessed through a dictionary where each node is a key e.g.

d = {"x1": [[1,2], [3,4]], "x2": [[5,6], [7,8]], ...}

Every tree node should inherit the data from its parent. Such that the data at the "x2" node would be [[1,2], [3,4], [5,6], [7,8]].

I have tried the module anytree but it requires that you define each node of the tree as a variable. Any ideas? Thanks in advance!

Asked By: CosmicPeanutButter

||

Answers:

If I understand your question correctly, one possible solution might be like this.

The tree nodes store their parents in order to construct the messy ├───
and └─── before directory/file names.

Output:

x1
├── x2
│   └── x3
├── x4
└── x5
    └── x6

Code:

class TreeNode:
    def __init__(self, name, parent):
        self.parent = parent
        self.name = name
        self.children = []

    def add_child(self, node):
        self.children.append(node)
        return node

    def print(self, is_root):
        pre_0 = "    "
        pre_1 = "│   "
        pre_2 = "├── "
        pre_3 = "└── "

        tree = self
        prefix = pre_2 if tree.parent and id(tree) != id(tree.parent.children[-1]) else pre_3

        while tree.parent and tree.parent.parent:
            if tree.parent.parent and id(tree.parent) != id(tree.parent.parent.children[-1]):
                prefix = pre_1 + prefix
            else:
                prefix = pre_0 + prefix

            tree = tree.parent

        if is_root:
            print(self.name)
        else:
            print(prefix + self.name)

        for child in self.children:
            child.print(False)

def find_and_insert(parent, edges):
    # Terminate if there is no edge
    if not edges:
        return
    
    # Find a child with the name edges[0] in the current node
    match = [tree for tree in parent.children if tree.name == edges[0]]
    
    # If there is already a node with the name edges[0] in the children, set "pointer" tree to this node. If there is no such node, add a node in the current tree node then set "pointer" tree to it
    tree = match[0] if match else parent.add_child(TreeNode(edges[0], parent))
    
    # Recursively process the following edges[1:]
    find_and_insert(tree, edges[1:])

paths = ["x1/x2", "x1/x2/x3", "x1/x4", "x1/x5/x6"]

root = TreeNode("x1", None)

for path in paths:
    find_and_insert(root, path.split("/")[1:])

root.print(True)
Answered By: Shihao Xu

bigtree is a Python tree implementation that integrates with Python lists, dictionaries, and pandas DataFrame.

For this scenario, there are three parts to this,

  1. Define a new Node class that does the inheritance of data from parent nodes
  2. Construct tree using the path list and Node class we defined earlier (1 line of code!)
  3. Add in the data dictionary that maps the node name to data (1 line of code!)
from bigtree import Node, list_to_tree, print_tree, add_dict_to_tree_by_name

# Define new Node class
class NodeInherit(Node):
    @property
    def data(self):
        if self.is_root:
            return self._data
        return self.parent.data + self._data

# Construct tree using the path list
paths = ["x1/x2", "x1/x2/x3", "x1/x4", "x1/x5/x6"]
root = list_to_tree(paths, node_type=NodeInherit)

# Add in the data dictionary
d = {"x1": [[1,2], [3,4]], "x2": [[5,6], [7,8]], "x3": [[9]], "x4": [[5,6]], "x5": [[5]], "x6": [[6]]}
d2 = {k: {"_data": v} for k, v in d.items()}  # minor input format change
root = add_dict_to_tree_by_name(root, d2)

# Check tree structure with data
print_tree(root, attr_list=["data"])

This results in output,

x1 [data=[[1, 2], [3, 4]]]
├── x2 [data=[[1, 2], [3, 4], [5, 6], [7, 8]]]
│   └── x3 [data=[[1, 2], [3, 4], [5, 6], [7, 8], [9]]]
├── x4 [data=[[1, 2], [3, 4], [5, 6]]]
└── x5 [data=[[1, 2], [3, 4], [5]]]
    └── x6 [data=[[1, 2], [3, 4], [5], [6]]]

You can also export the data out to dictionary or pandas DataFrame format besides printing it out to console.

Source/Disclaimer: I’m the creator of bigtree 😉

Answered By: Kay Jan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.