Pandas Convert Dataframe to Employee/Supervisor Hierarchy
Question:
I have a dataframe that is very similar to this question with the caveat that:
-
An employees level is not known
-
The order of employees is random
Because of (1) and (2), there may instances where the employee is parsed before their supervisor
I was using this answer as my basis but because of my caveats, there are a lot of instances where an employee is just under the top level because when supervisor = cache.get(supervisor_key(row), {})
is run, the supervisor has not been added yet and the .get()
defaults to {}
.
How do I dynamically shift a nested dict to insert a supervisor?
Edit: Sample data is same as linked question but the order has been changed and level is unknown
Employee_FN Employee_LN Supervisor_FN Supervisor_LN
4 Pam Beasley Jim Halpert
0 Michael Scott None None
7 Meredith Palmer Ryan Howard
1 Jim Halpert Michael Scott
2 Dwight Schrute Michael Scott
3 Stanley Hudson Jim Halpert
5 Ryan Howard Pam Beasley
6 Kelly Kapoor Ryan Howard
Output is:
[{'Employee_FN': 'Michael',
'Employee_LN': 'Scott',
'Reports': [{'Employee_FN': 'Jim',
'Employee_LN': 'Halpert',
'Reports': [{'Employee_FN': 'Stanley',
'Employee_LN': 'Hudson'},
{'Employee_FN': 'Pam',
'Employee_LN': 'Beasley',
'Reports': [{'Employee_FN': 'Ryan',
'Employee_LN': 'Howard',
'Reports': [{'Employee_FN': 'Kelly',
'Employee_LN': 'Kapoor'},
{'Employee_FN': 'Meredith',
'Employee_LN': 'Palmer'}]}]}]},
{'Employee_FN': 'Dwight', 'Employee_LN': 'Schrute', 'Level': 1}]}]
Answers:
You can use a graph, networkx
, and a recursive function :
Your graph:
# rework dataframe
df = df.replace({'None': pd.NA})
df['E'] = (df['Employee_FN']+' '+df['Employee_LN']).mask(df['Employee_FN'].isnull())
df['S'] = (df['Supervisor_FN']+' '+df['Supervisor_LN']).mask(df['Supervisor_FN'].isnull())
attr = df.set_index('E')[['Employee_FN', 'Employee_LN']].to_dict('index')
# generate graph
import networkx as nx
G = nx.from_pandas_edgelist(df.dropna(), source='S', target='E',
create_using=nx.DiGraph)
# generate nested dictionary
def make_dic(n, d=None):
if d is None:
d = {}
d.update(attr[n])
successors = list(G.successors(n))
if successors:
d['Reports'] = [make_dic(x) for x in successors]
return d
out = make_dic(next(nx.topological_sort(G)))
output:
{'Employee_FN': 'Michael',
'Employee_LN': 'Scott',
'Reports': [{'Employee_FN': 'Jim',
'Employee_LN': 'Halpert',
'Reports': [{'Employee_FN': 'Pam',
'Employee_LN': 'Beasley',
'Reports': [{'Employee_FN': 'Ryan',
'Employee_LN': 'Howard',
'Reports': [{'Employee_FN': 'Meredith', 'Employee_LN': 'Palmer'},
{'Employee_FN': 'Kelly', 'Employee_LN': 'Kapoor'}]}]},
{'Employee_FN': 'Stanley', 'Employee_LN': 'Hudson'}]},
{'Employee_FN': 'Dwight', 'Employee_LN': 'Schrute'}]}
I have a dataframe that is very similar to this question with the caveat that:
-
An employees level is not known
-
The order of employees is random
Because of (1) and (2), there may instances where the employee is parsed before their supervisor
I was using this answer as my basis but because of my caveats, there are a lot of instances where an employee is just under the top level because when supervisor = cache.get(supervisor_key(row), {})
is run, the supervisor has not been added yet and the .get()
defaults to {}
.
How do I dynamically shift a nested dict to insert a supervisor?
Edit: Sample data is same as linked question but the order has been changed and level is unknown
Employee_FN Employee_LN Supervisor_FN Supervisor_LN
4 Pam Beasley Jim Halpert
0 Michael Scott None None
7 Meredith Palmer Ryan Howard
1 Jim Halpert Michael Scott
2 Dwight Schrute Michael Scott
3 Stanley Hudson Jim Halpert
5 Ryan Howard Pam Beasley
6 Kelly Kapoor Ryan Howard
Output is:
[{'Employee_FN': 'Michael',
'Employee_LN': 'Scott',
'Reports': [{'Employee_FN': 'Jim',
'Employee_LN': 'Halpert',
'Reports': [{'Employee_FN': 'Stanley',
'Employee_LN': 'Hudson'},
{'Employee_FN': 'Pam',
'Employee_LN': 'Beasley',
'Reports': [{'Employee_FN': 'Ryan',
'Employee_LN': 'Howard',
'Reports': [{'Employee_FN': 'Kelly',
'Employee_LN': 'Kapoor'},
{'Employee_FN': 'Meredith',
'Employee_LN': 'Palmer'}]}]}]},
{'Employee_FN': 'Dwight', 'Employee_LN': 'Schrute', 'Level': 1}]}]
You can use a graph, networkx
, and a recursive function :
Your graph:
# rework dataframe
df = df.replace({'None': pd.NA})
df['E'] = (df['Employee_FN']+' '+df['Employee_LN']).mask(df['Employee_FN'].isnull())
df['S'] = (df['Supervisor_FN']+' '+df['Supervisor_LN']).mask(df['Supervisor_FN'].isnull())
attr = df.set_index('E')[['Employee_FN', 'Employee_LN']].to_dict('index')
# generate graph
import networkx as nx
G = nx.from_pandas_edgelist(df.dropna(), source='S', target='E',
create_using=nx.DiGraph)
# generate nested dictionary
def make_dic(n, d=None):
if d is None:
d = {}
d.update(attr[n])
successors = list(G.successors(n))
if successors:
d['Reports'] = [make_dic(x) for x in successors]
return d
out = make_dic(next(nx.topological_sort(G)))
output:
{'Employee_FN': 'Michael',
'Employee_LN': 'Scott',
'Reports': [{'Employee_FN': 'Jim',
'Employee_LN': 'Halpert',
'Reports': [{'Employee_FN': 'Pam',
'Employee_LN': 'Beasley',
'Reports': [{'Employee_FN': 'Ryan',
'Employee_LN': 'Howard',
'Reports': [{'Employee_FN': 'Meredith', 'Employee_LN': 'Palmer'},
{'Employee_FN': 'Kelly', 'Employee_LN': 'Kapoor'}]}]},
{'Employee_FN': 'Stanley', 'Employee_LN': 'Hudson'}]},
{'Employee_FN': 'Dwight', 'Employee_LN': 'Schrute'}]}