How to count people who are below a position

Question

I’m looking to count how many people are below a given user of the data frame.

Employee	Manager
A	–
B	A
C	A
D	A
E	A
F	B
G	B
H	C
I	C

I would like to get in the output:
I, H, G, F, E and D have no employees below
C has two employees (H and I) below it
B has two employees (F and G)
A has eight employees below him (B, C, D and E plus the employees of B and C)

Would anyone have any suggestions?
In my DF I have more hierarchy layers and a very large amount of data.

I thought about storing it in a dictionary and doing a loop to update it, but I believe that this solution is not efficient at all. I would like to know if there is any more efficient technique to solve this type of problem.

Asked By: Bruno Ramos Martins

||

Source

Answer 1

I would use a directed graph with networkx. This is a super fun python package.

import networkx as nx, pandas as pd

#set up data
employee = ['A', 'B', 'C','D','E','F','G','H','I']
manager = ['', 'A', 'A','A','A','B','B','C','C']
relations = pd.DataFrame(list(zip(employee,manager)), columns = ['Employee', 'Manager'])

# If there is no manager, make it the employee
relations.Manager = np.where(relations.Manager == '', relations.Employee, relations.Manager)
# or might need depending on data format:
relations.Manager = np.where(relations.Manager.isna(), relations.Employee, relations.Manager)

# Create tuples for 'edges'
relations['edge'] = list(zip(relations.Manager, relations.Employee))

# Create graph
G = nx.DiGraph()
G.add_nodes_from(list(relations.Employee))
G.add_edges_from(list(set(relations.edge)))

#Find all the descendants of nodes/employees
relations['employees_below'] = relations.apply(lambda row: nx.descendants(G,row.Employee), axis = 1)

returns:

  Employee Manager    edge           employees_below
0        A       A  (A, A)  {C, G, I, D, H, F, E, B}
1        B       A  (A, B)                    {F, G}
2        C       A  (A, C)                    {H, I}
3        D       A  (A, D)                        {}
4        E       A  (A, E)                        {}
5        F       B  (B, F)                        {}
6        G       B  (B, G)                        {}
7        H       C  (C, H)                        {}
8        I       C  (C, I)                        {}

The way it works: graphs are nodes and edges. In this case, your nodes are employees and your edges are a relationship between a manager and an employee. Do a quick google for ‘networkx directed graph’ images and you’ll get the idea of what this looks like in an image representation.

Make sure your data is cleaned up where everyone has a manager (make it themselves if there is none, for example)
First, create your edges in the form of a tuple of (manager, employee) and save it somewhere (I chose to make it a column in the df called edges).
Next, make a directed graph in networkx. A directed graph is needed due to the hierarchical relationship. this means that relationships work down from manager to employee. So, in this case, each edge goes in a direction from manager to employee.
Add every employee to your graph as a ‘node’.
Add every employee-manager relationship to your graph as an edge, using the pre-defined tuples of (manager, employee) discussed previously.
Lastly, you can get the output of an employee’s subordinates by finding all this node’s descendants. Descendants are all nodes (ie, employees) that can be reached from a node (ie, employee). I chose to assign this to a column and apply the descendants function to the employee in each row with apply.

Answered By: 34jbonz

Answer 2

As originally mentioned by @34jbonz networkx is the best tool for the task. There is however no need to preprocess the data as networkx provides a pandas interface

G = nx.from_pandas_edgelist(temp, source='manager',target='employee',create_using=nx.DiGraph)

also the use of apply and descendants should be avoided as it results in some calculations being done multiple times. Here a depth first search is the most efficient solution

for node in nx.dfs_postorder_nodes(G,'-'):
    successors = list(G.successors(node))
    G.nodes[node]['size'] = sum([G.nodes[p]['size'] for p in successors]) + len(successors)
    G.nodes[node]['descendants'] = [s for sn in successors for s in G.nodes[sn]['descendants']]
        + successors

finally information can be extracted in bulk from a networkx graph as a dict, which in turn can be transformed into a dataframe

pd.DataFrame.from_dict(dict(G.nodes(data=True)),orient='index')

Answered By: Arnau

How to count people who are below a position

Question:

Answers: