Build org hierarchy
Question:
having a dataset in a DataFrame similar to this one:
name manager ... sale
john N/A 100
anna john 120
ben john 110
cecilia ben 130
david clark 150
...
zain cecilia 450
I’d like to create a flat table with all the hierarchies, by specifying a root manager
e.g. buildHierarchy('john', 3)
would give me a table 3 levels deep like the one below
root manager1 manager2 manager3 sale
john 110
john anna 120
john ben 110
john ben cecilia 130
john ben cecilia zain 450
Answers:
This is a graph problem.
Here is your graph:
You can use networkx
:
import networkx as nx
def buildHierarchy(root, n):
G = nx.from_pandas_edgelist(df, source='manager', target='name',
create_using=nx.DiGraph)
G.remove_node(np.nan)
paths = [nx.shortest_path(G, root, name)
for i in range(n+1)
for name in nx.descendants_at_distance(G, source=root, distance=i)
]
df2 = pd.DataFrame(paths).iloc[:, 1:].fillna('').add_prefix('manager')
df2.insert(0, 'root', root)
df2['sales'] = (df.set_index('name')
.loc[[x[-1] for x in paths], 'sale']
.to_numpy()
)
return df2
out = buildHierarchy('john', 3)
output:
root manager1 manager2 manager3 sales
0 john 100
1 john ben 110
2 john anna 120
3 john ben cecilia 130
4 john ben cecilia zain 450
other example:
buildHierarchy('ben', 1)
root manager1 sales
0 ben 110
1 ben cecilia 130
having a dataset in a DataFrame similar to this one:
name manager ... sale
john N/A 100
anna john 120
ben john 110
cecilia ben 130
david clark 150
...
zain cecilia 450
I’d like to create a flat table with all the hierarchies, by specifying a root manager
e.g. buildHierarchy('john', 3)
would give me a table 3 levels deep like the one below
root manager1 manager2 manager3 sale
john 110
john anna 120
john ben 110
john ben cecilia 130
john ben cecilia zain 450
This is a graph problem.
Here is your graph:
You can use networkx
:
import networkx as nx
def buildHierarchy(root, n):
G = nx.from_pandas_edgelist(df, source='manager', target='name',
create_using=nx.DiGraph)
G.remove_node(np.nan)
paths = [nx.shortest_path(G, root, name)
for i in range(n+1)
for name in nx.descendants_at_distance(G, source=root, distance=i)
]
df2 = pd.DataFrame(paths).iloc[:, 1:].fillna('').add_prefix('manager')
df2.insert(0, 'root', root)
df2['sales'] = (df.set_index('name')
.loc[[x[-1] for x in paths], 'sale']
.to_numpy()
)
return df2
out = buildHierarchy('john', 3)
output:
root manager1 manager2 manager3 sales
0 john 100
1 john ben 110
2 john anna 120
3 john ben cecilia 130
4 john ben cecilia zain 450
other example:
buildHierarchy('ben', 1)
root manager1 sales
0 ben 110
1 ben cecilia 130