Fastest Algorithm/Software for MST and Searching on Large Sparse Graphs
Question:
I’ve developed a graph clustering model for assigning destinations to vehicle routes, but my implementation is very slow. It takes about two days to process a graph with 400k nodes.
My current implementation in Python is as follows:
Input data is a sparse graph:
Edges are roads
Nodes are vehicle destinations and road intersections
Create Minimum Spanning Tree using Prims Algorithm
For every edge in the MST:
Perform depth-first-search on the the two subgraphs on each side of the edge:
Sum up road lengths for each edge
If total road length for one of the subgraphs is within a defined range, then remove the edge
Any recommendations to make this implementation faster? Could using Networkx or Neo4J speed this up?
Answers:
Could using Networkx or Neo4J speed this up?
Yes. These libraries are written in C++ which is many times faster than python (the usual quote is approx 50 times faster )
Personally, I would recommend moving to C++ entirely. Python is fine for toy applications, but large graphs need the performance of a compiled language.
Here is the c++ code I use to find minimum spanning trees
Here are some timing tests on randomly generated graphs
Vertex Count
Run time
seconds
1 edge/vertex
Run time
seconds
3 edges/vertex
1,000
0.1
0.1
10,000
4
17
100,000
410
1900
I’ve developed a graph clustering model for assigning destinations to vehicle routes, but my implementation is very slow. It takes about two days to process a graph with 400k nodes.
My current implementation in Python is as follows:
Input data is a sparse graph:
Edges are roads
Nodes are vehicle destinations and road intersections
Create Minimum Spanning Tree using Prims Algorithm
For every edge in the MST:
Perform depth-first-search on the the two subgraphs on each side of the edge:
Sum up road lengths for each edge
If total road length for one of the subgraphs is within a defined range, then remove the edge
Any recommendations to make this implementation faster? Could using Networkx or Neo4J speed this up?
Could using Networkx or Neo4J speed this up?
Yes. These libraries are written in C++ which is many times faster than python (the usual quote is approx 50 times faster )
Personally, I would recommend moving to C++ entirely. Python is fine for toy applications, but large graphs need the performance of a compiled language.
Here is the c++ code I use to find minimum spanning trees
Here are some timing tests on randomly generated graphs
Vertex Count | Run time seconds 1 edge/vertex |
Run time seconds 3 edges/vertex |
---|---|---|
1,000 | 0.1 | 0.1 |
10,000 | 4 | 17 |
100,000 | 410 | 1900 |