Contracting a list of nodes in networkx

Question:

i have a dictionary with nodes:

supernodes = list(nx.connected_components(G1))

the result of print(supernodes) is:

[{1, 2, 3, 5}, {8, 6}, {7, 9, 10, 12, 13}, {4}, {11}, {14}, {15}]

how can i merge each list to a node? I found this function nx.contracted_nodes(G, (1, 3)) but how can i put {1,2,3,5}, {8,6} etc and create the 7 contracted nodes?

Asked By: Lee Yaan

||

Answers:

you can try this:

import networkx as nx
# Preamble, define G1 

# contract nodes
for supernode in nx.connected_components(G1):
    nodes = sorted(list(supernode))
    for node in nodes[1:]:
        G1 = nx.contracted_nodes(G1, nodes[0], node)

Every node x in G1 corresponds to the supernode having x as smaller element. If you want to remove self loops, write instead nx.contracted_nodes(G1, nodes[0], node, self_loops=False).

Answered By: rodgdor

I tried this answer, but it is too slow for big graphs. I discovered that converting the Networkx graph to DataFrame and combining the nodes on DataFrame is faster than the existing Networkx function.

import time
import networkx as nx

#create node replacement dictionary
def createRepDict(G1):

    node2supernode={}
    for supernode in nx.connected_components(G1):
        nodes = sorted(list(supernode))
        for node in nodes:
             node2supernode[node]=nodes[0]
    
    #fill the missing nodes with itself (if you have different task)
    for node in G1.nodes:
        if node not in node2supernode:
             node2supernode[node]=node
    return node2supernode

start_time=time.time()
for _ in range(10):
    G1=G.copy()
    df=nx.to_pandas_edgelist(G1)

    #create node replacement dictionary
    node2supernode=createRepDict(G1)
    
    df['source']=df.apply(lambda row: node2supernode[row['source']],axis=1)
    df['target']=df.apply(lambda row: node2supernode[row['target']],axis=1)

    # you can drop the self loop created after combining the nodes
    self_loop=df[df['source']==df['target']].index
    df=df.drop(self_loop)

    # edge_attr field can change based on your edge datas
    G1=nx.from_pandas_edgelist(df,'source','target', 
edge_attr='edgeData',create_using=nx.MultiDiGraph())
print(time.time()-start_time)

While this code takes only a total of 4 seconds for 10 random runs on a graph with around 5k nodes and 5k edges, it takes a total of 1455 seconds for 10 random runs with the existing methods.

Answered By: Enes Altınışık