Find all combinations of positive integers in increasing order that adds up to a given positive number n

Question:

How to write a function that takes n (where n > 0) and returns the list of all combinations of positive integers that sum to n?
This is a common question on the web. And there are different answers provided such as 1, 2 and 3. However, in the answers provided, they use two functions to solve the problem. I want to do it with only one single function. Therefore, I coded as follows:

def all_combinations_sum_to_n(n):
    from itertools import combinations_with_replacement

    combinations_list = []

    if n < 1:
        return combinations_list

    l = [i for i in range(1, n + 1)]
    for i in range(1, n + 1):
        combinations_list = combinations_list + (list(combinations_with_replacement(l, i)))

    result = [list(i) for i in combinations_list if sum(i) == n]
    result.sort()
    return result

If I pass 20 to my function which is all_combinations_sum_to_n(20), the OS of my machine kills the process as it is very costly. I think the space complexity of my function is O(n*n!). How do I modify my code so that I don’t have to create any other function and yet my single function has an improved time or space complexity? I don’t think it is possible by using itertools.combinations_with_replacement.

UPDATE

All answers provided by Barmar, ShadowRanger and pts are great. As I was looking for an efficient answer in terms of both memory and runtime, I used https://perfpy.com and selected python 3.8 to compare the answers. I used six different values of n and in all cases, ShadowRanger’s solution had the highest score. Therefore, I chose ShadowRanger’s answer as the best one. The scores were as follows:

enter image description here

Asked By: plpm

||

Answers:

Use recursion instead of generating all combinations and then filtering them.

def all_combinations_sum_to_n(n):
    combinations_set = set()
    for i in range(1, n):
        for sublist in all_combinations_sum_to_n(n-i):
            combinations_set.add(tuple(sorted((i,) + sublist)))

    combinations_set.add((n,))

    return sorted(combinations_set)

I had a simpler solution that didn’t use sorted() and put the results in a list, but it would produce duplicates that just differed in order, e.g. [1, 1, 2] and [1, 2, 1] when n == 4. I added those to get rid of duplicates.

On my MacBook M1 all_combinations_sum_to_n(20) completes in about 0.5 seconds.

Answered By: Barmar

You’ve got two main problems, one causing your current problem (out of memory) and one that will continue the problem even if you solve that one:

  1. You’re accumulating all combinations before filtering, so your memory requirements are immense. You don’t even need a single list if your function can be a generator (that is iterated to produce a value at a time) rather than returning a fully realized list, and even if you must return a list, you needn’t generate such huge intermediate lists. You might think you need at least one list for sorting purposes, but combinations_with_replacement is already guaranteed to produce a predictable order based on the input ordering, and since range is ordered, the values produced will be ordered as well.

  2. Even if you solve the memory problem, the computational cost of just generating that many combinations is prohibitive, due to poor scaling; for the memory, but not CPU, optimized version of the code below, it handles an input of 11 in 0.2 seconds, 12 in ~2.6 seconds, and 13 in ~11 seconds; at that scaling rate, 20 is going to approach heat death of the universe timeframes.

Barmar’s answer is one solution to both problems, but it’s still doing work eagerly and storing the complete results when the complete work might not even be needed, and it involves sorting and deduplication, which aren’t strictly necessary if you’re sufficiently careful about how you generate the results.

This answer will fix both problems, first the memory issue, then the speed issue, without ever needing memory requirements above linear in n.

Solving the memory issue alone actually makes for simpler code, that still uses your same basic strategy, but without consuming all your RAM needlessly. The trick is to write a generator function, that avoids storing more than one results at a time (the caller can listify if they know the output is small enough and they actually need it all at once, but typically, just looping over the generator is better):

from collections import deque  # Cheap way to just print the last few elements
from itertools import combinations_with_replacement  # Imports should be at top of file,
                                                     # not repeated per call
def all_combinations_sum_to_n(n):
    for i in range(1, n + 1):  # For each possible length of combinations...
        # For each combination of that length...
        for comb in combinations_with_replacement(range(1, n + 1), i):
            if sum(comb) == n:    # If the sum matches...
                yield list(comb)  # yield the combination

# 13 is the largest input that will complete in some vaguely reasonable time, ~10 seconds on TIO
print(*deque(all_combinations_sum_to_n(13), maxlen=10), sep='n')

Try it online!

Again, to be clear, this will not complete in any reasonable amount of time for an input of 20; there’s just too much redundant work being done, and the growth pattern for combinations scales with the factorial of the input; you must be more clever algorithmically. But for less intense problems, this pattern is simpler, faster, and dramatically more memory-efficient than a solution that builds up enormous lists and concatenates them.

To solve in a reasonable period of time, using the same generator-based approach (but without itertools, which isn’t practical here because you can’t tell it to skip over combinations when you know they’re useless), here’s an adaptation of Barmar’s answer that requires no sorting, produces no duplicates, and as a result, can produce the solution set in less than a 20th of a second, even for n = 20:

def all_combinations_sum_to_n(n, *, max_seen=1):
    for i in range(max_seen, n // 2 + 1):
        for subtup in all_combinations_sum_to_n(n - i, max_seen=i):
            yield (i,) + subtup
    yield (n,)


for x in all_combinations_sum_to_n(20):
    print(x)

Try it online!

That not only produces the individual tuples with internally sorted order (1 is always before 2), but produces the sequence of tuples in sorted order (so looping over sorted(all_combinations_sum_to_n(20)) is equivalent to looping over all_combinations_sum_to_n(20) directly, the latter just avoids the temporary list and a no-op sorting pass).

Answered By: ShadowRanger

Here is a fast iterative solution:

def csum(n):
  s = [[None] * (k + 1) for k in range(n + 1)]
  for k in range(1, n + 1):
    for m in range(k, 0, -1):
      s[k][m] = [[f] + terms
                 for f in range(m, (k >> 1) + 1) for terms in s[k - f][f]]
      s[k][m].append([k])
  return s[n][1]

import sys
n = 5
if len(sys.argv) > 1: n = int(sys.argv[1])
for terms in csum(n):
  print(' '.join(map(str, terms)))

Explanation:

  • Let’s define terms as a non-empty, increasing (can contain the same value multiple times) list of postitive integers.

  • The solution for n is a list of all terms in increasing lexicographical order, where the sum of each terms is n.

  • s[k][m] is a list of all terms in increasing lexicographical order, where the sum of each terms in n, and the first (smallest) integer in each terms is at least m.

  • The solution is s[n][1]. Before returning this solution, the csum function populates the s array using iterative dynamic programming.

  • In the inner loop, the following recursion is used: each term in s[k][m] either has at least 2 elements (f and the rest) or it has 1 element (k). In the former case, the rest is a terms, where the sum is k - f and the smallest integer is f, thus it is from s[k - f][f].

This solution is a lot faster than @Barmar’s solution if n is at least 20. For example, on my Linux laptop for n = 25, it’s about 400 times faster, and for n = 28, it’s about 3700 times faster. For larger values of n, it gets much faster really quickly.

This solution uses more memory than @ShadowRanger’s solution, because this solution creates lots of temporary lists, and it uses all of them until the very end.

How to come up with such a fast solution?

  • Try to find a recursive formula. (Don’t write code yet!)

  • Sometimes recursion works only with recursive functions with multiple variables. (In our case, s[k][m] is the recursive function, k is the obvious variable implied by the problem, and m is the extra variable we had to invent.) So try to add a few variables to the recursive formula: add the minimum number of variables to make it work.

  • Write your code so that it computes each recursive function value exactly once (not more). For that, you may add a cache (memoization) to your recursive function, or you may use dynamic programming, i.e. populating a (multidimensional) array in the correct order, so that what is needed is already populated.

Answered By: pts
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.