Build org hierarchy

Question:

having a dataset in a DataFrame similar to this one:

name    manager    ...    sale
john    N/A               100
anna    john              120
ben     john              110
cecilia ben               130
david   clark             150
...
zain    cecilia           450

I’d like to create a flat table with all the hierarchies, by specifying a root manager

e.g. buildHierarchy('john', 3) would give me a table 3 levels deep like the one below

root      manager1    manager2    manager3    sale
john                                          110
john      anna                                120
john      ben                                 110 
john      ben         cecilia                 130
john      ben         cecilia     zain        450
Asked By: Bishonen_PL

||

Answers:

This is a graph problem.

Here is your graph:

hierarchy

You can use networkx:

import networkx as nx

def buildHierarchy(root, n):
    G = nx.from_pandas_edgelist(df, source='manager', target='name',
                                create_using=nx.DiGraph)
    G.remove_node(np.nan)

    paths = [nx.shortest_path(G, root, name)
             for i in range(n+1)
             for name in nx.descendants_at_distance(G, source=root, distance=i)
             ]

    df2 = pd.DataFrame(paths).iloc[:, 1:].fillna('').add_prefix('manager')
    df2.insert(0, 'root', root)
    
    df2['sales'] = (df.set_index('name')
                      .loc[[x[-1] for x in paths], 'sale']
                      .to_numpy()
                   )
    
    return df2

out = buildHierarchy('john', 3)

output:

   root manager1 manager2 manager3  sales
0  john                               100
1  john      ben                      110
2  john     anna                      120
3  john      ben  cecilia             130
4  john      ben  cecilia     zain    450

other example:

buildHierarchy('ben', 1)

  root manager1  sales
0  ben             110
1  ben  cecilia    130
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.