Fastest Algorithm/Software for MST and Searching on Large Sparse Graphs


I’ve developed a graph clustering model for assigning destinations to vehicle routes, but my implementation is very slow. It takes about two days to process a graph with 400k nodes.

My current implementation in Python is as follows:

Input data is a sparse graph:
       Edges are roads
       Nodes are vehicle destinations and road intersections

Create Minimum Spanning Tree using Prims Algorithm

For every edge in the MST:
      Perform depth-first-search on the the two subgraphs on each side of the edge:
              Sum up road lengths for each edge
      If total road length for one of the subgraphs is within a defined range, then remove the edge

Any recommendations to make this implementation faster? Could using Networkx or Neo4J speed this up?


Could using Networkx or Neo4J speed this up?

Yes. These libraries are written in C++ which is many times faster than python (the usual quote is approx 50 times faster )

Personally, I would recommend moving to C++ entirely. Python is fine for toy applications, but large graphs need the performance of a compiled language.

Here is the c++ code I use to find minimum spanning trees

Here are some timing tests on randomly generated graphs

Vertex Count Run time
1 edge/vertex
Run time
3 edges/vertex
1,000 0.1 0.1
10,000 4 17
100,000 410 1900
Answered By: ravenspoint