Check circular reference within DataFrame (Python)

Question:

I’m working on a set of employee data where all employees report to a manager.
In the Data Frame, all employees are shown as an ID and each ID has a parent ID (the manager’s ID).
Is there a way to check if any employee’s reporting line direct back to themselves?

Example data frame:

pd.DataFrame({"id":[111,112,113],"parentid":[112,113,111]})

In this example employee 111 reports to 112, 112 reports to 113, 113 reports to 111. The line becomes a circular reference. Is there a way to check for this kind of circular reference?

Thank you very much!

Asked By: KY_blubrain

||

Answers:

This is a perfect use case for networkx to approach your data as a graph.

This is your graph:

circular references pandas graph

Create a directed graph and use simple_cycles to identify the circular references

import networkx as nx

G = nx.from_pandas_edgelist(df, source='parentid', target='id',
                            create_using=nx.DiGraph)

list(nx.simple_cycles(G))

output: [[112, 111, 113]]

If you want to label the circular nodes, you can further use:

circular = {n for l in nx.simple_cycles(G) for n in l}

df['circular'] = df['id'].isin(circular)

output (on a more complex example):

    id  parentid  circular
0  111       112      True
1  112       113      True
2  113       111      True
3  210       211     False
4  211       212     False
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.