gather element that share infomation with pandas

Question:

My task is to gather element that share something in common.

Example:

pd.DataFrame({'C1': ['A', 'B', 'C', 'D', 'R', 'X'], 'C2': ['B', 'C', 'D', 'E', 'S', 'Y']})
    C1  C2
0   A   B
1   B   C
2   C   D
3   D   E
4   R   S
5   X   Y

and This is what I’m looking for:

    Groups
0   [A, B, C, D, E]
1   [R, S]
2   [X, Y]

Any Idea ?

Asked By: elouassif

||

Answers:

You should use networkx.connected_components here.

Approaching your data as a graph is a reliable way to ensure grouping all values together.

import networkx as nx

G = nx.from_pandas_edgelist(df, source='C1', target='C2')

out = pd.DataFrame({'Groups': map(sorted, nx.connected_components(G))})

output:

            Groups
0  [A, B, C, D, E]
1           [R, S]
2           [X, Y]

Your graph:

enter image description here

Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.