Creating an ID Group based on two IDs that have a many to many relationship in Pandas Python

Question:

I have a table with two ID columns, I want to create a new ID that groups where these overlap.

The point of this is to understand what level you can sum the unique values linked to each id such that one total can be divided by the other, such that all value are covered and there is no double counting.

For example if there is a table like this:

ID 1 ID 2
1 1
1 2
2 3
3 4
3 5
4 5

I want to create a new id column like such:

ID 1 ID 2 ID 3
1 1 1
1 2 1
2 3 2
3 4 3
3 5 3
4 5 3

Thanks for any help and hopefully that is clear 🙂

I am very new to pandas and not sure where to begin

Thanks

Asked By: Josh Tysseling

||

Answers:

This is inherently a graph problem, you can solve it robustly with networkx:

import networkx as nx

# make ids unique (ID1/1 ≠ ID2/1)
id1 = df['ID 1'].astype(str).radd('ID1_')
id2 = df['ID 2'].astype(str).radd('ID2_')

# make graph
G = nx.from_edgelist(zip(id1, id2))

# get subgraphs
new_ids = {k: i for i, s in enumerate(nx.connected_components(G), start=1)
           for k in s}

df['ID 3'] = id1.map(new_ids)

Output:

   ID 1  ID 2  ID 3
0     1     1     1
1     1     2     1
2     2     3     2
3     3     4     3
4     3     5     3
5     4     5     3

Your graph:

enter image description here

Answered By: mozway