Find all possible sums of the combinations of integers from a set, efficiently
Question:
Given an integer n, and an array a of x random positive integers, I would like to find all possible sums of the combinations with replacement (n out of x) that can be drawn from this array.
For example:
n = 2, a = [2, 3, 4, 6]
all combinations = [2+2, 2+3, 2+4, 2+6, 3+3, 3+4, 3+6, 4+4, 4+6, 6+6]
all unique sums of these combination = {4, 5, 6, 7, 8, 9, 10, 12}
This can of course easily be solved by enumerating and summing all possible combinations, for example in Python:
from itertools import combinations_with_replacement
n = 2
a = [2,3,4,6]
{sum(comb) for comb in combinations_with_replacement(a, n)}
Is there a more efficient way to do this? I have to do this for n up to 4 and a up to a 1000 values, which gives 4e10 combinations, while the number of unique sums will be several orders of magnitude less for arrays with integers whose values aren’t too far apart, so I would guess there must be a more efficient way.
For example when n=3 and a is the set of the first 1000 even numbers, there will be only 2998 unique sums out of 1.6E8 possible combinations.
** Original question was updated to state that integers are only positive
Answers:
sums = {0}
for _ in range(n):
sums = {s + x for s in sums for x in a}
Or using a bitset (assumes your numbers are non-negative):
sums = 1
for _ in range(n):
new = 0
for x in a:
new |= sums << x
sums = new
sums = {i for i, bit in enumerate(reversed(bin(sums))) if bit == '1'}
Which one is faster depends on the density of your numbers.
A further optimization of the second solution which can also handle negative numbers (it translates all values so that the smallest becomes 0):
sums = 1
minimum = min(a)
b = [x - minimum for x in a]
for _ in range(n):
new = 0
for x in b:
new |= sums << x
sums = new
sums = {
i + n*minimum
for i, bit in enumerate(reversed(bin(sums)))
if bit == '1'
}
Yet another, intended for dense sets:
minimum = min(a)
maximum = max(a)
sums = set(a) if n else {0}
for i in range(2, n + 1):
new = set()
for s in range(minimum * i, maximum * i + 1):
for x in a:
if s - x in sums:
new.add(s)
break
sums = new
One for large n, working like exponentiation by squaring:
sums = {0}
while n:
if n % 2:
sums = {s + x for s in sums for x in a}
n //= 2
if n:
a = {x + y for x in a for y in a}
Benchmark according to your "let’s say that the first thousand positive even numbers is realistic. Though there will be some uneven mixed in as well" (I used "some"=42) and with n=4 (since you said "I don’t think I’ll ever need more than n=4").
2.14 ± 0.03 ms Kelly2
2.44 ± 0.04 ms Kelly3
45.10 ± 4.05 ms Kelly4b
208.19 ± 2.22 ms Kelly4
807.97 ± 4.79 ms Kelly1
1216.60 ± 19.22 ms Kelly5
Benchmark code (Attempt This Online!):
def Kelly1(a, n):
sums = {0}
for _ in range(n):
sums = {s + x for s in sums for x in a}
return sums
def Kelly2(a, n):
sums = 1
for _ in range(n):
new = 0
for x in a:
new |= sums << x
sums = new
return {i for i, bit in enumerate(reversed(bin(sums))) if bit == '1'}
def Kelly3(a, n):
sums = 1
minimum = min(a)
b = [x - minimum for x in a]
for _ in range(n):
new = 0
for x in b:
new |= sums << x
sums = new
return {
i + n*minimum
for i, bit in enumerate(reversed(bin(sums)))
if bit == '1'
}
def Kelly4(a, n):
minimum = min(a)
maximum = max(a)
sums = set(a) if n else {0}
for i in range(2, n + 1):
new = set()
for s in range(minimum * i, maximum * i + 1):
for x in a:
if s - x in sums:
new.add(s)
break
sums = new
return sums
# optimization: try subtracting odd numbers first
def Kelly4b(a, n):
minimum = min(a)
maximum = max(a)
sums = set(a) if n else {0}
a = sorted(a, key=lambda x: -(x % 2))
for i in range(2, n + 1):
new = set()
for s in range(minimum * i, maximum * i + 1):
for x in a:
if s - x in sums:
new.add(s)
break
sums = new
return sums
def Kelly5(a, n):
sums = {0}
while n:
if n % 2:
sums = {s + x for s in sums for x in a}
n //= 2
if n:
a = {x + y for x in a for y in a}
return sums
funcs = Kelly1, Kelly2, Kelly3, Kelly4, Kelly4b, Kelly5
from random import sample
from statistics import mean, stdev
from time import perf_counter as time
# Correctness
for n in range(11):
a = sample(range(1, 10**4), 10)
expect = funcs[0](a, n)
for f in funcs:
result = f(a, n)
assert result == expect
# Speed
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e3 for t in sorted(times[f])[:5]]
return f'{mean(ts):7.2f} ± {stdev(ts):5.2f} ms '
for _ in range(15):
evens = list(range(2, 2001, 2))
odds = sample(range(1, 2000, 2), 42)
a = sorted(evens + odds)
n = 4
for f in funcs:
t0 = time()
result = f(a, n)
times[f].append(time() - t0)
del result
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__)
Given an integer n, and an array a of x random positive integers, I would like to find all possible sums of the combinations with replacement (n out of x) that can be drawn from this array.
For example:
n = 2, a = [2, 3, 4, 6]
all combinations = [2+2, 2+3, 2+4, 2+6, 3+3, 3+4, 3+6, 4+4, 4+6, 6+6]
all unique sums of these combination = {4, 5, 6, 7, 8, 9, 10, 12}
This can of course easily be solved by enumerating and summing all possible combinations, for example in Python:
from itertools import combinations_with_replacement
n = 2
a = [2,3,4,6]
{sum(comb) for comb in combinations_with_replacement(a, n)}
Is there a more efficient way to do this? I have to do this for n up to 4 and a up to a 1000 values, which gives 4e10 combinations, while the number of unique sums will be several orders of magnitude less for arrays with integers whose values aren’t too far apart, so I would guess there must be a more efficient way.
For example when n=3 and a is the set of the first 1000 even numbers, there will be only 2998 unique sums out of 1.6E8 possible combinations.
** Original question was updated to state that integers are only positive
sums = {0}
for _ in range(n):
sums = {s + x for s in sums for x in a}
Or using a bitset (assumes your numbers are non-negative):
sums = 1
for _ in range(n):
new = 0
for x in a:
new |= sums << x
sums = new
sums = {i for i, bit in enumerate(reversed(bin(sums))) if bit == '1'}
Which one is faster depends on the density of your numbers.
A further optimization of the second solution which can also handle negative numbers (it translates all values so that the smallest becomes 0):
sums = 1
minimum = min(a)
b = [x - minimum for x in a]
for _ in range(n):
new = 0
for x in b:
new |= sums << x
sums = new
sums = {
i + n*minimum
for i, bit in enumerate(reversed(bin(sums)))
if bit == '1'
}
Yet another, intended for dense sets:
minimum = min(a)
maximum = max(a)
sums = set(a) if n else {0}
for i in range(2, n + 1):
new = set()
for s in range(minimum * i, maximum * i + 1):
for x in a:
if s - x in sums:
new.add(s)
break
sums = new
One for large n, working like exponentiation by squaring:
sums = {0}
while n:
if n % 2:
sums = {s + x for s in sums for x in a}
n //= 2
if n:
a = {x + y for x in a for y in a}
Benchmark according to your "let’s say that the first thousand positive even numbers is realistic. Though there will be some uneven mixed in as well" (I used "some"=42) and with n=4 (since you said "I don’t think I’ll ever need more than n=4").
2.14 ± 0.03 ms Kelly2
2.44 ± 0.04 ms Kelly3
45.10 ± 4.05 ms Kelly4b
208.19 ± 2.22 ms Kelly4
807.97 ± 4.79 ms Kelly1
1216.60 ± 19.22 ms Kelly5
Benchmark code (Attempt This Online!):
def Kelly1(a, n):
sums = {0}
for _ in range(n):
sums = {s + x for s in sums for x in a}
return sums
def Kelly2(a, n):
sums = 1
for _ in range(n):
new = 0
for x in a:
new |= sums << x
sums = new
return {i for i, bit in enumerate(reversed(bin(sums))) if bit == '1'}
def Kelly3(a, n):
sums = 1
minimum = min(a)
b = [x - minimum for x in a]
for _ in range(n):
new = 0
for x in b:
new |= sums << x
sums = new
return {
i + n*minimum
for i, bit in enumerate(reversed(bin(sums)))
if bit == '1'
}
def Kelly4(a, n):
minimum = min(a)
maximum = max(a)
sums = set(a) if n else {0}
for i in range(2, n + 1):
new = set()
for s in range(minimum * i, maximum * i + 1):
for x in a:
if s - x in sums:
new.add(s)
break
sums = new
return sums
# optimization: try subtracting odd numbers first
def Kelly4b(a, n):
minimum = min(a)
maximum = max(a)
sums = set(a) if n else {0}
a = sorted(a, key=lambda x: -(x % 2))
for i in range(2, n + 1):
new = set()
for s in range(minimum * i, maximum * i + 1):
for x in a:
if s - x in sums:
new.add(s)
break
sums = new
return sums
def Kelly5(a, n):
sums = {0}
while n:
if n % 2:
sums = {s + x for s in sums for x in a}
n //= 2
if n:
a = {x + y for x in a for y in a}
return sums
funcs = Kelly1, Kelly2, Kelly3, Kelly4, Kelly4b, Kelly5
from random import sample
from statistics import mean, stdev
from time import perf_counter as time
# Correctness
for n in range(11):
a = sample(range(1, 10**4), 10)
expect = funcs[0](a, n)
for f in funcs:
result = f(a, n)
assert result == expect
# Speed
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e3 for t in sorted(times[f])[:5]]
return f'{mean(ts):7.2f} ± {stdev(ts):5.2f} ms '
for _ in range(15):
evens = list(range(2, 2001, 2))
odds = sample(range(1, 2000, 2), 42)
a = sorted(evens + odds)
n = 4
for f in funcs:
t0 = time()
result = f(a, n)
times[f].append(time() - t0)
del result
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__)