Efficient sums of n-length combinations of array

Question:

Given the following input:

  • An integer n, e.g., 36.
  • An list/array mylist of length m, e.g., [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]. Although the values in this example are evenly spaced and integer-valued floats, they are not necessarily evenly spaced and they are in general non-negative real numbers.

I am trying to do the following computational steps:

  1. Generate a list of n-length combinations (with replacement) of mylist, e.g. for n == 3:
    [0.0, 0.0, 0.0]
    [0.0, 0.0, 24.0]
    ...
    [120.0, 120.0, 96.0]
    [120.0, 120.0, 120.0]
    
  2. Sum the elements of each combination in the above list, e.g. for n == 3:
    0
    24.0
    ...
    336.0
    360.0
    
  3. Removing duplicates from the above list of sums, e.g., reducing the list length from 56 to 16 for n == 3 or from 749,398 to 181 for n == 36.

I have implemented this in Python in two ways: using lists and using pandas.DataFrames. For values of n as high as 36+, the above steps take too long for my application (1+ seconds) due to the exponential nature of combinations. Although performing steps 2 and 3 on a DataFrame brings speed improvements, creating the DataFrame from the list of combinations makes the overall process slower again.

Implementation using lists:

from itertools import combinations_with_replacement

n = 36
mylist = [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]

combinations = list(combinations_with_replacement(mylist, n))  # Step 1.
combination_sums = list(map(sum, combinations))                # Step 2.
unique_combination_sums = list(set(combination_sums))          # Step 3.

Implementation using DataFrames:

from itertools import combinations_with_replacement

import pandas as pd

n = 36
mylist = [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]

combinations_df = pd.DataFrame(list(combinations_with_replacement(mylist, n)))  # Step 1.
combination_sums_df = combinations_df.sum(axis=1)                               # Step 2.
unique_combination_sums_df = combination_sums_df.drop_duplicates()              # Step 3.

Note: Using numpy.ndarrays is computationally faster than using DataFrames but slower than using lists.

Is there a more efficient algorithm, library, or other technique to make the above process faster? Perhaps something that takes advantage of the tree-like nature of combinations?

Asked By: Lazy Titanic

||

Answers:

Assuming that myList can be anything (i.e. we can’t rely on it having equally spaced values like in the example) you could try something like this. The idea is to build up the sums of length-1 combinations, then all the sums of length-2 combinations, etc, uniquifying at each step rather than only at the end.

mylist = [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]
n = 36

sums = {0}
for i in range(n):
  # All the sums from a combination of (i+1) terms:
  sums = {s + p for s in sums for p in mylist}

print(len(sums))

Generates 181 sums for your example, as expected.

I haven’t tested speed in much detail, but from a quick run on replit, using your mylist, n can go to about 550 without the runtime exceeding 1 second. That input is probably a bit flattering though, since the number of distinct sums is smaller for a list of equally spaced values than for an arbitrary input list.

Answered By: slothrop

For any arbitrary input list recursive approach is more efficient.
It avoids permutations that will give the same result (like [0, 1, 2], [2, 1, 0])

def dankal(input_list, n):
    unique_sums = set()
    for i in range(n + 1):
        if len(input_list) == 1:
            unique_sums.add(input_list[0] * n)
        else:
            current_item_sum = i * input_list[0]
            downstream_sums = dankal(input_list[1:], n - i)
            for item in downstream_sums:
                unique_sums.add(item + current_item_sum)
    return unique_sums

Compared to slothrop version:

  • it is much slower for equally spaced input
  • it is much faster for random input
    enter image description here

Probably, using lists (or even better, numpy arrays) instead of sets would make this algorithm faster

Answered By: dankal444
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.