Fastest Algorithm/Software for MST and Searching on Large Sparse Graphs

Question:

I’ve developed a graph clustering model for assigning destinations to vehicle routes, but my implementation is very slow. It takes about two days to process a graph with 400k nodes.

My current implementation in Python is as follows:

Input data is a sparse graph:
       Edges are roads
       Nodes are vehicle destinations and road intersections

Create Minimum Spanning Tree using Prims Algorithm

For every edge in the MST:
      Perform depth-first-search on the the two subgraphs on each side of the edge:
              Sum up road lengths for each edge
      If total road length for one of the subgraphs is within a defined range, then remove the edge

Any recommendations to make this implementation faster? Could using Networkx or Neo4J speed this up?

Answers:

Could using Networkx or Neo4J speed this up?

Yes. These libraries are written in C++ which is many times faster than python (the usual quote is approx 50 times faster )

Personally, I would recommend moving to C++ entirely. Python is fine for toy applications, but large graphs need the performance of a compiled language.

Here is the c++ code I use to find minimum spanning trees

https://github.com/JamesBremner/PathFinder/blob/50b89a0ff57e13cb34b0348b073d698a22ede406/src/GraphTheory.cpp#L180-L251

Here are some timing tests on randomly generated graphs

Vertex Count Run time
seconds
1 edge/vertex
Run time
seconds
3 edges/vertex
1,000 0.1 0.1
10,000 4 17
100,000 410 1900
Answered By: ravenspoint