Efficient algorithm to find the sum of all concatenated pairs of integers in a list

Question:

I had this problem in one of my interview practices and had a problem getting this with a better time complexity other than O(N^2). At some level you’ll have to visit each element in the list. I thought about using hash table but it would still have to conduct the hash table and populate it then do the calculation. Basically my solution was a nested for loop and I have my code included as well and it passed everything except time exception under 4 seconds.

My Code:

def concatenationsSum(a):
    sum = 0
    current_index_looking_at = 0
    for i in a:
        for x in a:
            temp = str(i)+str(x)
            sum += int(temp)
    return sum

The problem description:

Given an array of positive integers a, your task is to calculate the sum
of every possible a[i] ∘ a[j], where a[i] ∘ a[j] is the concatenation
of the string representations of a[i] and a[j] respectively.
    
    Example
    
    For a = [10, 2], the output should be concatenationsSum(a) = 1344.
    
    a[0] ∘ a[0] = 10 ∘ 10 = 1010,
    a[0] ∘ a[1] = 10 ∘ 2 = 102,
    a[1] ∘ a[0] = 2 ∘ 10 = 210,
    a[1] ∘ a[1] = 2 ∘ 2 = 22.
    So the sum is equal to 1010 + 102 + 210 + 22 = 1344.
    
    For a = [8], the output should be concatenationsSum(a) = 88.
    
    There is only one number in a, and a[0] ∘ a[0] = 8 ∘ 8 = 88, so the answer is 88.
    
    Input/Output
    
    [execution time limit] 4 seconds (py3)
    
    [input] array.integer a
    
    A non-empty array of positive integers.
    
    Guaranteed constraints:
    1 ≤ a.length ≤ 10^5,
    1 ≤ a[i] ≤ 10^6.
    
    [output] integer64
    
    The sum of all a[i] ∘ a[j]s. It's guaranteed that the answer is less than 2^53.
Asked By: STOPIMACODER

||

Answers:

The concatenation of two integers:

m ∘ n

is equal to:

10**digit_length(n) * m + n

so the sum of the concatenations of every list item with a given integer:

(a[0] ∘ n) + (a[1] ∘ n) + …

is equal to:

(10**digit_length(n) * a[0] + n) + (10**digit_length(n) * a[1] + n) + …

and you can put all the ns on one side:

(10**digit_length(n) * a[0]) + (10**digit_length(n) * a[1]) + … + n + n + …

and note that each element of the array is multiplied by a value that only depends on n:

10**digit_length(n) * (a[0] + a[1] + …) + n + n + …

simplifying again:

10**digit_length(n) * sum(a) + len(a) * n

sum(a) doesn’t change, and the sum of len(a) * ns across all ns is len(a) * sum(a):

def concatenationsSum(a):
    return (sum(10**digit_length(n) for n in a) + len(a)) * sum(a)


def digit_length(n):
    """
    The number of base-10 digits in an integer.

    >>> digit_length(256)
    3

    >>> digit_length(0)
    1
    """
    return len(str(n))

This runs in linear time when the upper bound on the integers involved is constant. You can also use math.log10 to make digit_length faster as long as floating-point math is precise enough for the integer sizes involved (and if not, there are still better ways to implement it than going through a string – but probably no shorter or more understandable ways).

Answered By: Ry-

It’s impossible to efficiently generate each number seperately. What you can do, however, is to try to calculate the result without necessarly generating the individual values.

Numbers in the array are up to 10^6. That means each number has from 1 to 7 digits. Put all the numbers into groups: in a single group there should be numbers with the same amount of digits. There will be up to 7 groups. That you can do in O(n) (for the next steps only the sizes of the groups actually matter, you don’t have to physically create 7 lists of numbers)

Consider an integer X in the array. You will concatenate it with the rest of the numbers in the array. Concatenation with an integer Y with K digits can be seen as: X * 10^K + Y.
You want to calculate the sum of the concatenations, it’s much easier to calculate how many times each digit will actually act as Y (exactly N-1 times, where N is a size of the array) and how many times it will be an X with a specific K value (there are only 7 possible Ks, check how many integers are in each of the groups; for example if you are considering K = 4, the amount is equal to the size of the group 4). You can do that in O(1).

The last step is to calculate the result using the previous computations. This is quite straightforward, for each number V in the array you add to the result V * Y_V, V * 10 * X_V_1, Y * 100 * Y_V_2, …, where Y_V equals to the number of concatenations where V acts as Y, X_V_K equals to the number of concatenations where V acts as X with an integer Y with K digits. Having all the values already calculated, it takes O(n) time.

Answered By: Maras

I don’t see a way to do it without looping through the list but you could amp the efficiency a little by not storing temp and by computing a[i]°a[j] and a[j]°a[i] at the same time.

def concatenationsSum(a):
    sum = 0
    for i in range(len(a)):
        sum += int(str(a[i])+str(a[i])) ##diagonal
        for j in range(i):
            sum += int(str(a[i])+str(a[j]))+int(str(a[j])+str(a[i])) ##off-diagonal
    return sum

This might save some milliseconds. But I’d love to see how much.

EDIT: The benchmark tests proposed by @superb_rain were a good idea. I generated some random test cases within the constraints of the assignment and my proposed optimization did not make it faster.

Obviously, getting list elements by index cost more time than storing them temporarily. So, I optimized further. The code below results in 35%-42% less time for the execution of 300 test cases.

def concatenationsSum(a):
    sum = 0
    for i in range(len(a)):
        x = str(a[i]) 
        sum += int(x+x) ##diagonal
        for j in range(i):
            y=str(a[j])
            sum += int(x+y)+int(y+x) ##off-diagonal
    return sum

EDIT (again): I have found a faster way that only has complexity O(2n) instead of O(n^2) and does not use the str() function.

  • First, note how many numbers with how many digits there are.
  • Add len(a) times the sum of all numbers at the beginning, as each number is at the end of a concatenated integer exactly len(a) times.
  • Then, use the information on the number of digits to add 10**digits times each number as each number has to be in front of each other number, when adding them up.
def concatenationsSum(a):
    pnum = [10**p for p in range(6,-1,-1)]
    pot = dict(zip(pnum,[0]*7))
    for e in a:
        for p in pnum:
            if e>=p:
                pot[p]+=1
                break
    v=pot.items()
            
    total = sum(a)*len(a)
    for e in a:
        for p,n in v:
            total += n*e*p*10
    return total

This algorithm gets results for test cases with up to 10^6 list elements of up to 10^5 values in under 10 seconds (on my Laptop). So, it still is not quite up to par but there is potential to make it more efficient, I think. At least, it does not have an O(n^2) complexity, anymore.

Answered By: Martin Wettstein

Instead of prepending each number with every number separately, just prepend it once with the sum. Well, then it appears as the tail only once instead of N times, so just add it N-1 more times (or equivalently, overall add the sum N-1 times).

def concatenationsSum(a):
    sum_ = sum(a)
    return sum(int(str(sum_) + str(x)) for x in a) + (len(a) - 1) * sum_

Runtime is O(N). Demo at repl.it for only 1000 values, output:

original result 460505045000 in 3.3822 seconds
  faster result 460505045000 in 0.0017 seconds
Same result? True
Answered By: superb rain

Comparing 3 functions (I think all of them O(n^2) but there is a little difference in the speed.

1:

def concatenationsSum(a):
    sum = 0
    for i in a:
        for x in a:
            temp = str(i)+str(x)
            sum += int(temp)
    return sum

2:

def sumAllPermutations(a):
    import itertools
    allPermutations=list(itertools.product(a,repeat=2))
    sum=0
    for x in allPermutations:
        sum+=int(str(x[0])+str(x[1]))
    
    return sum

3:

def withouIterTools(list):
    Sum = sum([int(str(a)+str(b)) for a in list for b in list])
    return Sum

from datetime import datetime 
a = [10, 2,33,4,67,123,444,55556,432,56456,1,12,3,4]

start_time = datetime.now() 
for i in range(10000):
    Sum=concatenationsSum(a)
print(Sum)
time_elapsed = datetime.now() - start_time 
print('Time elapsed (hh:mm:ss.ms) {}'.format(time_elapsed))
#---------------------------------------------------------------
start_time = datetime.now() 
for i in range(10000):
    Sum=sumAllPermutations(a)
print(Sum)
time_elapsed = datetime.now() - start_time 
print('Time elapsed (hh:mm:ss.ms) {}'.format(time_elapsed))
#---------------------------------------------------------------
start_time = datetime.now() 
for i in range(10000):
    Sum=withouIterTools(a)
print(Sum)
time_elapsed = datetime.now() - start_time 
print('Time elapsed (hh:mm:ss.ms) {}'.format(time_elapsed))

times

23021341208
Time elapsed (hh:mm:ss.ms) 0:00:04.294685
23021341208
Time elapsed (hh:mm:ss.ms) 0:00:04.723034
23021341208
Time elapsed (hh:mm:ss.ms) 0:00:04.156921
Answered By: FEldin

For PHP developers
https://github.com/sslawand351/codesignal/tree/master/concatenationsSum

<?php

// Optimised Solution
function concatenationsSum(array $a): int {
    if (count($a) == 0) {
        return 0;
    }
    $digitLength = [];
    for ($i=0; $i < count($a); $i++) {
        if ($a[$i] < 0) {
            echo 'Array contains negative integer elements';
            return 0;
        }
        $digitLength[strlen($a[$i])] = ($digitLength[strlen($a[$i])] ?? 0) + 1;
    }
    $sum = 0;
    for ($i=0; $i < count($a); $i++) {
        foreach ($digitLength as $length => $count) {
            $sum +=  $a[$i] * (pow(10, $length) + 1) * $count;
        }
    }
    return $sum;
}
Answered By: Sagar Lawand
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.