Algorithm to find the "percolation" threshold in a weighed network

Question:

I have some states that are linked by transition probabilities (embedded within a transition matrix as in a Markov chain). I want to summarise this transition matrix by considering only probabilities sufficiently high that they allow to go from one state (~node) to another (first and last in my transition matrix). A threshold so that if I consider only higher probabilities my transition matrix never allow to move from the first to the last states (or nodes).

Two questions:

Are there some well known libraries (preferentially python language) that implement such method? My naive/empiric/prototypic approach would be one loop that decreases value of the threshold then check if I can flow through the transition matrix from the first state to the last (kind of best path algorithm in a distance matrix?). But this may need very high computation time?

Example 1: 
DistMatrix = [[       'a',   'b',     'c',    'd'], 
             ['a',  0,      0.3,    0.4,    0.1],
             ['b',   0.3,    0,      0.9,    0.2],
             ['c',   0.4,    0.9,    0,      0.7],
             ['d',   0.1,    0.2,    0.7,   0]]  
    states are a,b,c,d. I want to find the value (threshold) that allow to go from a to d (no matter if other states are walked)  
    Naive approach:
    - first loop: threshold 0.9, I get rid of lesser probabilities: I can only connect c and b 
    - second loop: threshold 0.7, I get rid of lesser probabilities: I can only connect c, b, d
    - third loop: threshold 0.4, I get rid of lesser probabilities: I can connect a,c, b, d: here is my threshold: 0.4

–> should be incredibly complicated as soon as my transition matrix have many thousands states? –> Algorithm to propose?

Example 2:
DistMatrix =
[       'a',   'b',     'c',    'd'],
['a',   0,      0.3,    0.4,    0.7],
['b',   0.3,    0,      0.9,    0.2],
['c',   0.4,    0.9,    0,      0.1],
['d',   0.7,    0.2,    0.1,    0] ] 
states are a,b,c,d. I want to find the value (threshold) that allow to go from a to d (no matter if other states are walked) 
Naive approach:
-first loop: threshold 0.9, I get rid of all others: I can only connect c and b
-second loop: threshold 0.7, I get rid of lesser connexion: I connect b and c, and a and d but because a and d are connected, I have my threshold!
Asked By: sol

||

Answers:

Not sure I’m interpreting your question correctly, however:

Assume you have a candidate threshold and you want to determine whether there is a path between a and d. You can check which nodes are accessible from a by performing a simple depth first search on the thresholded graph and seeing if your desired end node d has been visited.

To actually find the threshold you know it must be between zero and the maximum transition probability in your graph (here 0.9). So you can perform a binary search for the threshold, at each stage using the depth-first-search to check if you have a path between a and d.

Answered By: YXD

To expand on what Mr E suggested, here are two versions of an algorithm that works decently on graphs with a few thousand nodes. Both versions use Numpy and the second one also uses NetworkX.

You need to get rid of the ‘a’, ‘b’, ‘c’ and ‘d’ identifiers in order to be able to use Numpy arrays. This is easily done by translating your node names to integers between 0 and len(nodes). Your arrays should look as follow

DistMatrix1 = np.array([[0,      0.3,    0.4,    0.1],
                        [0.3,    0,      0.9,    0.2],
                        [0.4,    0.9,    0,      0.7],
                        [0.1,    0.2,    0.7,   0]])

DistMatrix2 = np.array([[0,      0.3,    0.4,    0.7],
                        [0.3,    0,      0.9,    0.2],
                        [0.4,    0.9,    0,      0.1],
                        [0.7,    0.2,    0.1,    0]])

Use numpy.unique to get a sorted array of all probabilities in the distance matrix. Then, perform a standard binary search, as suggested by Mr E. At each step in the binary search, replace the entries in the matrix by 0 if they are below the current probability. Run a breadth first search on the graph, starting a the first node, and see if you reach the last node. If you do, the threshold is higher, otherwise, the threshold is lower. The bfs code is actually adapted from the NetworkX version.

import numpy as np

def find_threshold_bfs(array):
    first_node = 0
    last_node = len(array) - 1
    probabilities = np.unique(array.ravel())
    low = 0
    high = len(probabilities)

    while high - low > 1:
        i = (high + low) // 2
        prob = probabilities[i]
        copied_array = np.array(array)
        copied_array[copied_array < prob] = 0.0
        if bfs(copied_array, first_node, last_node):
            low = i
        else:
            high = i

    return probabilities[low]


def bfs(graph, source, dest):
    """Perform breadth-first search starting at source. If dest is reached,
    return True, otherwise, return False."""
    # Based on http://www.ics.uci.edu/~eppstein/PADS/BFS.py
    # by D. Eppstein, July 2004.
    visited = set([source])
    nodes = np.arange(0, len(graph))
    stack = [(source, nodes[graph[source] > 0])]
    while stack:
        parent, children = stack[0]
        for child in children:
            if child == dest:
                return True
            if child not in visited:
                visited.add(child)
                stack.append((child, nodes[graph[child] > 0]))
        stack.pop(0)
    return False

A slower, but shorter version uses NetworkX. In the binary search, instead of running bfs, convert the matrix to a NetworkX graph and check whether there is a path from the first to the last node. If there is a path, the threshold is higher, if there is none, the threshold is lower. This is slow because of all the graph data structure in NetworkX is much less efficient than Numpy arrays. However, it has the advantage of giving access to a bunch of other useful algorithms.

import networkx as nx
import numpy as np

def find_threshold_nx(array):
    """Return the threshold value for adjacency matrix in array."""
    first_node = 0
    last_node = len(array) - 1
    probabilities = np.unique(array.ravel())
    low = 0
    high = len(probabilities)

    while high - low > 1:
        i = (high + low) // 2
        prob = probabilities[i]
        copied_array = np.array(array)
        copied_array[copied_array < prob] = 0.0
        graph = nx.from_numpy_matrix(copied_array)
        if nx.has_path(graph, first_node, last_node):
            low = i
        else:
            high = i

    return probabilities[low]

The NetworkX version crashes on graphs with more than one thousand nodes or so (on my laptop). The bfs version easily find the threshold for graphs of several thousand nodes.

A sample run of the code is as follows.

In [5]: from percolation import *

In [6]: print('Threshold is {}'.format(find_threshold_bfs(DistMatrix1)))
Threshold is 0.4

In [7]: print('Threshold is {}'.format(find_threshold_bfs(DistMatrix2)))
Threshold is 0.7

In [10]: big = np.random.random((6000, 6000))

In [11]: print('Threshold is {}'.format(find_threshold_bfs(big)))
Threshold is 0.999766933071

For timings, I get (on a semi-recent Macbook Pro):

In [5]: smaller = np.random.random((100, 100))

In [6]: larger = np.random.random((800, 800))

In [7]: %timeit find_threshold_bfs(smaller)
100 loops, best of 3: 11.3 ms per loop

In [8]: %timeit find_threshold_nx(smaller)
10 loops, best of 3: 94.9 ms per loop

In [9]: %timeit find_threshold_bfs(larger)
1 loops, best of 3: 207 ms per loop

In [10]: %timeit find_threshold_nx(larger)
1 loops, best of 3: 6 s per loop

Hope this helps.

Update

I modified the bfs code so that it stops whenever the destination node is reached. The code and timings above have been updated.

Answered By: Loïc Séguin-C.
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.