Yet another combinations with conditions question

Question:

I want to efficiently generate pairs of elements from two lists equal to their Cartesian product with some elements omitted. The elements in each list are unique.

The code below does exactly what’s needed but I’m looking to optimize it perhaps by replacing the loop.

See the comments in the code for details.

Any advice would be appreciated.

from itertools import product
from pprint import pprint as pp

def pairs(list1, list2):
    """ Return all combinations (x,y) from list1 and list2 except:
          1. Omit combinations (x,y) where x==y """
    tuples = filter(lambda t: t[0] != t[1], product(list1,list2))

    """   2. Include only one of the combinations (x,y) and (y,x) """
    result = []
    for t in tuples:
        if not (t[1], t[0]) in result:
            result.append(t)
    return result

list1 = ['A', 'B', 'C']
list2 = ['A', 'D', 'E']
pp(pairs(list1, list1))  #  Test a list with itself
pp(pairs(list1, list2))  #  Test two lists with some common elements

Output

[('A', 'B'), ('A', 'C'), ('B', 'C')]
[('A', 'D'),
 ('A', 'E'),
 ('B', 'A'),
 ('B', 'D'),
 ('B', 'E'),
 ('C', 'A'),
 ('C', 'D'),
 ('C', 'E')]
Asked By: C. Pappy

||

Answers:

This is definitely less code, but you’d have to measure it on some larger lists to see if it’s actually any faster. Using Python’s native types tends to be. It also doesn’t do an inclusion check which will traverse the list every check. It relies on the set to get rid of the matches.

def pairs(list1, list2):
    return list(set((tuple(sorted(t)) for t in product(list1, list2) if t[0] != t[1])))

It technically gives you a different output to what you have above because you’re sorting (x, y). You didn’t specify which of the two forms you wanted to keep, so if it doesn’t matter, then they are technically equivalent.

Answered By: saquintes

UPDATE:

My original answer (see below) omitted duplicate pairs when there are duplicate values within one or both of the input lists, which is outside the scope of OP’s input constraints ("elements in each list are unique").

However, just to address a slightly broader question than OP’s, here’s an alternative to OP’s code that should give equivalent results even if list elements were not unique and still seems to be somewhat faster:

def pairs2(list1, list2):
    res = []
    cache = set()
    for x in list1:
        for y in list2:
            if not (x==y or (y,x) in cache):
                cache.add((x,y))
                res += [(x,y)]
    return res

… or, if you prefer condensed code:

def pairs2(list1, list2):
    cache = set()
    return [(cache.add((x,y)), (x,y))[-1] for x in list1 for y in list2 if not (x==y or (y,x) in cache)]

Benchmarking:

10x performance improvement for input lists of length 20 with 4 duplicates:

list1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

list2
[0, 1, 2, 3, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]

Timeit results:
foo_1 (OP) ran in 0.0013974264999851585 seconds using 1000 iterations
foo_2 (list/set/comprehension) ran in 0.00011681730000418612 seconds using 1000 iterations

44x faster for input lists of length 40 with 8 duplicates:

list1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]

list2
[0, 1, 2, 3, 4, 5, 6, 7, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71]

Timeit results:
foo_1 (OP) ran in 0.025825615999929143 seconds using 100 iterations
foo_2 (list/set/comprehension) ran in 0.0005934359997627326 seconds using 100 iterations

Original answer:

Here is alternative code:

def pairs2(list1, list2):
    return list(set((x,y) if x < y else (y,x) for x in list1 for y in list2 if x != y))

NOTE: @Kelly Bundy pointed out in a comment that a set comprehension such as the following is slightly faster than the code above:

def pairs3(list1, list2):
    return {(x,y) if x < y else (y,x) for x in list1 for y in list2 if x != y}

Benchmarking shows:

Timeit results:
foo_1 (OP) ran in 8.308585199993104e-06 seconds using 1000000 iterations
foo_2 (list/set/comprehension) ran in 4.874373900005594e-06 seconds using 1000000 iterations

So it’s about 2x as fast for your examples.

Here’s another benchmark for lists of length 10 with 2 overlapping values, showing about a 6x improvement:

list1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

list2
[0, 1, 10, 11, 12, 13, 14, 15, 16, 17]

Timeit results:
foo_1 (OP) ran in 0.0002618508199986536 seconds using 10000 iterations
foo_2 (list/set/comprehension) ran in 4.290239999827463e-05 seconds using 10000 iterations

And one more with lists of length 20 and 4 overlapping values shows performance of 30x:

list1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

list2
[0, 1, 2, 3, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]

Timeit results:
foo_1 (OP) ran in 0.003016028999991249 seconds using 1000 iterations
foo_2 (list/set/comprehension) ran in 0.00018069539999123662 seconds using 1000 iterations
Answered By: constantstranger

UPDATE:

There are now four answers including my own answer, pairs3.

The following script checks and benchmarks the solutions.

import timeit
import gc
from itertools import product, combinations, chain
from pprint import pprint as pp

""" This script checks and benchmarks the solutions. """

#  Solution offered by @saquintes
def pairs1(list1, list2):
    return list(set((tuple(sorted(t)) for t in product(list1, list2)
        if t[0] != t[1])))

#  Solution offered by @constantstranger
def pairs2(list1, list2):
    cache = set()
    return [(cache.add((x,y)), (x,y))[-1] for x in list1 for y in list2
            if not (x==y or (y,x) in cache)]

#  My solution 
def pairs3(list1, list2):
    res = dict()
    for x,y in product(list1,list2):
        if not (x == y or res.get((y,x))):
            res[(x,y)] = 1
    return list(res.keys())

#  This is the very clever and exceedingly fast
#  solution offered by @Kelly Bundy.
#  (See @Kelly Bundy's post for a version that returns an iterator!)
def pairs4(list1, list2):
    a = {*list1}
    b = {*list2}
    ab = a & b
    return [
        *product(a, b-a),
        *product(a-b, ab),
        *combinations(ab, 2)
    ]

def check(fn, list1, list2):
    """ Prints the output of function fn """
    print('nFunction', fn.__name__, 'output:')
    res = fn(list1, list2)
    pp(res)

def run_checks(functions):
    """ Passes lists to each function in functions and prints the results """
    for fn in functions:
        print('n------------------------------------------------n')
        print('Function:', fn.__name__, 'n')
        print('Combinations of a list with itself:')
        list1 = ['A', 'B', 'C']
        list2 = ['A', 'B', 'C']
        print('list1:', list1)
        print('list2:', list2)
        check(fn, list1, list1)
        
        print('nCombinations of Two lists with a common element:')
        list1 = ['A', 'B', 'C']
        list2 = ['A', 'D', 'E']
        print('list1:', list1)
        print('list2:', list2)
        check(fn, list1, list2)
        
def benchmark(fn, list1, list2, iterations):
    def callit():
        fn(list1, list2)
    gc.collect()
    t = timeit.timeit(callit, number=iterations)
    print('Function', fn.__name__, 'time:', f'{t:.2f} secs')

def run_benchmarks(functions, list1, list2, iterations):
    print('nRunning', f'{iterations}', 'iterations per run repeated 3 timesn')
    for fn in functions:
        benchmark(fn, list1, list2, iterations)
        benchmark(fn, list1, list2, iterations)
        benchmark(fn, list1, list2, iterations)
        print()

print('n-------------- Check performance --------------n')
list1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
list2 = [0, 1, 2, 3, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
print('list1:', list1)
print('list2:', list2)
run_benchmarks([pairs1, pairs2, pairs3, pairs4], list1, list2, 100000)

print('--------------  Check correctness --------------n')
run_checks([pairs1, pairs2, pairs3, pairs4])

Output:

-------------- Check performance --------------

list1: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
list2: [0, 1, 2, 3, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]

Running 100000 iterations per run repeated 3 times

Function pairs1 time: 20.61 secs
Function pairs1 time: 20.53 secs
Function pairs1 time: 20.54 secs

Function pairs2 time: 10.69 secs
Function pairs2 time: 10.69 secs
Function pairs2 time: 10.68 secs

Function pairs3 time: 9.87 secs
Function pairs3 time: 9.87 secs
Function pairs3 time: 9.89 secs

Function pairs4 time: 1.44 secs
Function pairs4 time: 1.44 secs
Function pairs4 time: 1.44 secs

--------------  Check correctness --------------


------------------------------------------------

Function: pairs1 

Combinations of a list with itself:
list1: ['A', 'B', 'C']
list2: ['A', 'B', 'C']

Function pairs1 output:
[('B', 'C'), ('A', 'C'), ('A', 'B')]

Combinations of Two lists with a common element:
list1: ['A', 'B', 'C']
list2: ['A', 'D', 'E']

Function pairs1 output:
[('C', 'E'),
 ('B', 'D'),
 ('A', 'B'),
 ('A', 'E'),
 ('B', 'E'),
 ('C', 'D'),
 ('A', 'C'),
 ('A', 'D')]

------------------------------------------------

Function: pairs2 

Combinations of a list with itself:
list1: ['A', 'B', 'C']
list2: ['A', 'B', 'C']

Function pairs2 output:
[('A', 'B'), ('A', 'C'), ('B', 'C')]

Combinations of Two lists with a common element:
list1: ['A', 'B', 'C']
list2: ['A', 'D', 'E']

Function pairs2 output:
[('A', 'D'),
 ('A', 'E'),
 ('B', 'A'),
 ('B', 'D'),
 ('B', 'E'),
 ('C', 'A'),
 ('C', 'D'),
 ('C', 'E')]

------------------------------------------------

Function: pairs3 

Combinations of a list with itself:
list1: ['A', 'B', 'C']
list2: ['A', 'B', 'C']

Function pairs3 output:
[('A', 'B'), ('A', 'C'), ('B', 'C')]

Combinations of Two lists with a common element:
list1: ['A', 'B', 'C']
list2: ['A', 'D', 'E']

Function pairs3 output:
[('A', 'D'),
 ('A', 'E'),
 ('B', 'A'),
 ('B', 'D'),
 ('B', 'E'),
 ('C', 'A'),
 ('C', 'D'),
 ('C', 'E')]

------------------------------------------------

Function: pairs4 

Combinations of a list with itself:
list1: ['A', 'B', 'C']
list2: ['A', 'B', 'C']

Function pairs4 output:
[('A', 'C'), ('A', 'B'), ('C', 'B')]

Combinations of Two lists with a common element:
list1: ['A', 'B', 'C']
list2: ['A', 'D', 'E']

Function pairs4 output:
[('A', 'E'),
 ('A', 'D'),
 ('C', 'E'),
 ('C', 'D'),
 ('B', 'E'),
 ('B', 'D'),
 ('C', 'A'),
 ('B', 'A')]
Answered By: C. Pappy

About 5-6 times faster than the fastest in your answer’s benchmark. I build sets of values that appear in both lists or just one, and combine them appropriately without further filtering.

from itertools import product, combinations

def pairs(list1, list2):
    a = {*list1}
    b = {*list2}
    ab = a & b
    return [
        *product(a, b-a),
        *product(a-b, ab),
        *combinations(ab, 2)
    ]

You could also make it an iterator (because unlike previous solutions, I don’t need to store the already produced pairs to filter further ones):

from itertools import product, combinations, chain

def pairs(list1, list2):
    a = {*list1}
    b = {*list2}
    ab = a & b
    return chain(
        product(a, b-a),
        product(a-b, ab),
        combinations(ab, 2)
    )
Answered By: Kelly Bundy
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.