How to divide by two groups (Python Pandas DataFrame)?
Question:
I have a dataset like this about installation(ON)-remove(OFF) of equipment.
A,B,C,D are ID of each independant equipment. I want to divide these ID by two groups with some rules.
As you see, when I remove B, I install A. And After removing A, I install C. It is same for T after removing C.
In the same way, when I remove D, I install F. Same for H after F.
My hypothesis is that there are two group of equipement. For example, we can say that :
Group 1 : B-A-C-T
Group 2 : D-F-H
ON = ['A','C','F','T','H']
OFF = ['B','A','D','C','F']
df= pd.DataFrame({'ON':ON,'OFF':OFF})
Maybe I can try something with dictionary, but I have no idea.
I want to two list as a result :
Group 1 = ['B','A','C','T']
Group 2 = ['D','F','H']
Answers:
Using a network library like networkx
can simplify the problem. What you want is to find all paths from root and leaf nodes.
# pip install networkx
import networkx as nx
import itertools
# Create a directed graph from Pandas edges list
G = nx.from_pandas_edgelist(df, source='OFF', target='ON', create_using=nx.DiGraph)
# Find all roots and leaves
roots = [node for node, degree in G.in_degree if degree == 0]
leaves = [node for node, degree in G.out_degree if degree == 0]
# Get all possible paths between roots and leaves
paths = []
for root, leaf in itertools.product(roots, leaves):
for path in nx.all_simple_paths(G, root, leaf):
paths.append(path)
Output:
>>> paths
[['B', 'A', 'C', 'T'], ['D', 'F', 'H']]
Visualization:
import matplotlib.pyplot as plt
nx.draw_networkx(G)
plt.show()
I have a dataset like this about installation(ON)-remove(OFF) of equipment.
A,B,C,D are ID of each independant equipment. I want to divide these ID by two groups with some rules.
As you see, when I remove B, I install A. And After removing A, I install C. It is same for T after removing C.
In the same way, when I remove D, I install F. Same for H after F.
My hypothesis is that there are two group of equipement. For example, we can say that :
Group 1 : B-A-C-T
Group 2 : D-F-H
ON = ['A','C','F','T','H']
OFF = ['B','A','D','C','F']
df= pd.DataFrame({'ON':ON,'OFF':OFF})
Maybe I can try something with dictionary, but I have no idea.
I want to two list as a result :
Group 1 = ['B','A','C','T']
Group 2 = ['D','F','H']
Using a network library like networkx
can simplify the problem. What you want is to find all paths from root and leaf nodes.
# pip install networkx
import networkx as nx
import itertools
# Create a directed graph from Pandas edges list
G = nx.from_pandas_edgelist(df, source='OFF', target='ON', create_using=nx.DiGraph)
# Find all roots and leaves
roots = [node for node, degree in G.in_degree if degree == 0]
leaves = [node for node, degree in G.out_degree if degree == 0]
# Get all possible paths between roots and leaves
paths = []
for root, leaf in itertools.product(roots, leaves):
for path in nx.all_simple_paths(G, root, leaf):
paths.append(path)
Output:
>>> paths
[['B', 'A', 'C', 'T'], ['D', 'F', 'H']]
Visualization:
import matplotlib.pyplot as plt
nx.draw_networkx(G)
plt.show()