Pandas Convert Dataframe to Employee/Supervisor Hierarchy

Question:

I have a dataframe that is very similar to this question with the caveat that:

  1. An employees level is not known

  2. The order of employees is random

Because of (1) and (2), there may instances where the employee is parsed before their supervisor

I was using this answer as my basis but because of my caveats, there are a lot of instances where an employee is just under the top level because when supervisor = cache.get(supervisor_key(row), {}) is run, the supervisor has not been added yet and the .get() defaults to {}.

How do I dynamically shift a nested dict to insert a supervisor?

Edit: Sample data is same as linked question but the order has been changed and level is unknown

  Employee_FN Employee_LN Supervisor_FN Supervisor_LN
4         Pam     Beasley           Jim       Halpert
0     Michael       Scott          None          None
7    Meredith      Palmer          Ryan        Howard
1         Jim     Halpert       Michael         Scott
2      Dwight     Schrute       Michael         Scott
3     Stanley      Hudson           Jim       Halpert
5        Ryan      Howard           Pam       Beasley
6       Kelly      Kapoor          Ryan        Howard

Output is:

[{'Employee_FN': 'Michael',
  'Employee_LN': 'Scott',
  'Reports': [{'Employee_FN': 'Jim',
    'Employee_LN': 'Halpert',
    'Reports': [{'Employee_FN': 'Stanley',
      'Employee_LN': 'Hudson'},
     {'Employee_FN': 'Pam',
      'Employee_LN': 'Beasley',
      'Reports': [{'Employee_FN': 'Ryan',
        'Employee_LN': 'Howard',
        'Reports': [{'Employee_FN': 'Kelly',
          'Employee_LN': 'Kapoor'},
         {'Employee_FN': 'Meredith',
          'Employee_LN': 'Palmer'}]}]}]},
   {'Employee_FN': 'Dwight', 'Employee_LN': 'Schrute', 'Level': 1}]}]
Asked By: Bijan

||

Answers:

You can use a graph, networkx, and a recursive function :

Your graph:

enter image description here

# rework dataframe
df = df.replace({'None':  pd.NA})
df['E'] = (df['Employee_FN']+' '+df['Employee_LN']).mask(df['Employee_FN'].isnull())
df['S'] = (df['Supervisor_FN']+' '+df['Supervisor_LN']).mask(df['Supervisor_FN'].isnull())

attr = df.set_index('E')[['Employee_FN', 'Employee_LN']].to_dict('index')

# generate graph
import networkx as nx

G = nx.from_pandas_edgelist(df.dropna(), source='S', target='E',
                            create_using=nx.DiGraph)

# generate nested dictionary
def make_dic(n, d=None):
    if d is None:
        d = {}
    d.update(attr[n])
    successors = list(G.successors(n))
    if successors:
        d['Reports'] = [make_dic(x) for x in successors]
    return d

out = make_dic(next(nx.topological_sort(G)))

output:

{'Employee_FN': 'Michael',
 'Employee_LN': 'Scott',
 'Reports': [{'Employee_FN': 'Jim',
   'Employee_LN': 'Halpert',
   'Reports': [{'Employee_FN': 'Pam',
     'Employee_LN': 'Beasley',
     'Reports': [{'Employee_FN': 'Ryan',
       'Employee_LN': 'Howard',
       'Reports': [{'Employee_FN': 'Meredith', 'Employee_LN': 'Palmer'},
        {'Employee_FN': 'Kelly', 'Employee_LN': 'Kapoor'}]}]},
    {'Employee_FN': 'Stanley', 'Employee_LN': 'Hudson'}]},
  {'Employee_FN': 'Dwight', 'Employee_LN': 'Schrute'}]}
Answered By: mozway