Is there an efficient method of limiting sets of unique combinations of two lists in Python?

Question:

I have the following code to generate each set of unique combinations between two lists.
Alternatively phrased, all possible sets of matches of targets from a list of sources.

from itertools import permutations

def combos(sources, targets):
    assert len(sources) >= len(targets)
    for x in permutations(sources,len(targets)):
        yield list(zip(targets, x))

for x in combos([1,2,3],['a','b']):
    print(x)

The result:

[('a', 1), ('b', 2)]
[('a', 1), ('b', 3)]
[('a', 2), ('b', 1)]
[('a', 2), ('b', 3)]
[('a', 3), ('b', 1)]
[('a', 3), ('b', 2)]

I have a function which can calculate a "weight" as an integer for each pairing.
I was planning on iterating through each of the resulting lists, summing together the weights, and finding the sets with the least total weight.
Ideally I care most about the set of matches with the least weight.

But with large inputs it becomes completely impractical just to iterate the permutations, Python just stalls.
I was thinking I could limit it by only considering the N best source matches for each target based on weight. But I’m having trouble wrapping my head around how to do that.
Is there either:

  1. A more efficient way of either getting the correct answer, or
  2. Limiting to a "probably correct" answer within a much shorter execution time?
Asked By: kcghost

||

Answers:

The relationship (connections and weights) between your sources and targets are edges of a graph. Specifically in your case, it is a bipartite graph, because there are only edges between nodes from the two separate groups (like, no a-b or 0-2 edges).

The problem of finding all the possible ways to form edges in the graph without sharing nodes is called "matching", like [b-1, a-3]. If you assign weights to each edge and wants to find the matching with the minimum total weight, it is called minimum weight matching.

The minimum (or maximum) weight matching specifically for bipartite graphs is called the assignment problem, and there are several algorithm to solve it in polynomial time, like the Hungarian algorithm and the Karp algorithm.

The solution of the minimum weight matching for the generic case (non-bipartite graphs) is much harder, and the only algorithm I know for it is the complex blossom algorithm.

There is an excellent library to work with graphs in python named networkx. Here is an example on how to use it to solve your problem. I am using the function hash for the weights, but you can use whatever your logic demands instead.

import networkx as nx

graph = nx.complete_bipartite_graph(range(5), 'abcdefg')
sources, targets = nx.bipartite.sets(graph)

for a, b in graph.edges():
    graph.edges[a, b]["weight"] = hash((a, b)) # whatever logic you have
    print(a, b, graph.edges[a, b])

matching = nx.bipartite.minimum_weight_full_matching(graph)
result = [(a, matching[a], graph.edges[a, matching[a]]["weight"]) for a in sources]

print("minimum weight match", *result, sep="n")
print("minimum weight sumn", sum(w for _,_,w in result))
# minimum weight match:
#    (0, 'e', -5652564962590522143)
#    (1, 'c', -2359047611522945134)
#    (2, 'b', 3092743657457074790)
#    (3, 'a', 5054043193701197225)
#    (4, 'g', -7940909611028334467)
# minimum weight sum: -7805735333983529729
Answered By: Rodrigo Rodrigues