gather element that share infomation with pandas
Question:
My task is to gather element that share something in common.
Example:
pd.DataFrame({'C1': ['A', 'B', 'C', 'D', 'R', 'X'], 'C2': ['B', 'C', 'D', 'E', 'S', 'Y']})
C1 C2
0 A B
1 B C
2 C D
3 D E
4 R S
5 X Y
and This is what I’m looking for:
Groups
0 [A, B, C, D, E]
1 [R, S]
2 [X, Y]
Any Idea ?
Answers:
You should use networkx.connected_components
here.
Approaching your data as a graph is a reliable way to ensure grouping all values together.
import networkx as nx
G = nx.from_pandas_edgelist(df, source='C1', target='C2')
out = pd.DataFrame({'Groups': map(sorted, nx.connected_components(G))})
output:
Groups
0 [A, B, C, D, E]
1 [R, S]
2 [X, Y]
Your graph:
My task is to gather element that share something in common.
Example:
pd.DataFrame({'C1': ['A', 'B', 'C', 'D', 'R', 'X'], 'C2': ['B', 'C', 'D', 'E', 'S', 'Y']})
C1 C2
0 A B
1 B C
2 C D
3 D E
4 R S
5 X Y
and This is what I’m looking for:
Groups
0 [A, B, C, D, E]
1 [R, S]
2 [X, Y]
Any Idea ?
You should use networkx.connected_components
here.
Approaching your data as a graph is a reliable way to ensure grouping all values together.
import networkx as nx
G = nx.from_pandas_edgelist(df, source='C1', target='C2')
out = pd.DataFrame({'Groups': map(sorted, nx.connected_components(G))})
output:
Groups
0 [A, B, C, D, E]
1 [R, S]
2 [X, Y]
Your graph: