Longest Collatz sequence Efficiency

Question:

I am trying to answer Problem 14 of Project Euler, unfortunately my code takes longer than necessary to run. How can I make it efficient, I have reduced it but it still lacks the efficiency. My code is below

my_seq_final = []
for i in range(1000000, 0, -1):
    ans = i
    my_seq = [ans]
    while ans != 1:
        n = ans
        if n % 2 == 0:
            ans = n/2
        else:
            ans = 3*n + 1
        my_seq.append(ans)
    my_seq_final.append(my_seq)

tmp = 0
result = [0]
for j in my_seq_final[::-1]:
    if tmp < len(j):
        tmp = len(j)
        result[0] = j
    else:
        pass
print(result[0][0])
Asked By: Samuel Mensah

||

Answers:

You could use memoization to avoid calculating many times the same Collatz sequence.

cache = {1: 1}
def collatz_count(n):
    if n not in cache:
        if n % 2 == 0:
            cache[n] = 1 + collatz_count(n / 2)
        else:
            cache[n] = 1 + collatz_count(3 * n + 1)
    return cache[n]

Suppose I call this function with 6:

 In []: collatz_count(6)
 Out[]: 9

This is the length of the sequence [6, 3, 10, 5, 16, 8, 4, 2, 1]. Moreover, the cache has been modified by side-effect:

 {1: 1, 2: 2, 3: 8, 4: 3, 5: 6, 6: 9, 8: 4, 10: 7, 16: 5}

Thereafter, if I call the function with 12, the length (10) of the sequence [12, 6, 3, 10, 5, 16, 8, 4, 2, 1] will be calculated very quickly, since the second term (6) is already associated to its length.

In your problem, most of the required lengths will be simply retrieved from the cache or calculated through very few recursive calls.

Specifically, in the specified range, the average number of calls to collatz_count is:

  • 310.534203 (without cache);
  • 3.16861 (with cache).

PS: Note that I have chosen to implement my cache as a Python dictionary. The generated Collatz numbers grow indeed far above the given bound of 10**6 (namely, 56,991,483,520 is the maximal “intermediate” value). An array of this size would be 99.9961% empty, and nevertheless require at least ~111 GB (2 bytes per value).

Answered By: Aristide

If I read the code correctly, you are saving all the sequences, and at the end deciding which is the longest.

You can surely reduce the memory consumption, and therefore speed things up, simply by determining the length of the current sequence and then comparing it with the longest sequence found so far. If you need to print it, then you can save it.

You can speed things up by recording the length of the sequence from n in an array. Then when you try a new number, you can check whether the length from that number is already known, and if so, simply add the length to the current sequence length. This uses memory, but less memory than recording whole sequences.

Answered By: Jonathan Leffler
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.