Collatz Conjecture Python – Incorrect Output Above 2 Trillion (Only!)

Question:

I’ve written a basic script in Python3 that calculates the Collatz Conjecture. It takes a positive integer as an input and returns the number the steps until the sequence drops down to 1.

My script works perfectly for any integer inputs less than ~2 trillion, but above this threshold the outputs are too small.

As an example, here are some inputs, my script’s output, and the actual correct output:

Integer Input          Script Output     Correct Output
   989,345,275,647        1,348             1,348 
 1,122,382,791,663        1,356             1,356 
 1,444,338,092,271        1,408             1,408 
 1,899,148,184,679        1,411             1,411 
 2,081,751,768,559          385             1,437 
 2,775,669,024,745          388             1,440 
 3,700,892,032,993          391             1,443 
 3,743,559,068,799          497             1,549 `

Correct output values are based on this link: http://www.ericr.nl/wondrous/delrecs.html

My script’s output is always exactly 1,052 less than the correct output for inputs above 2 trillion, but I have no idea as to what’s causing this.

Can anyone explain what’s wrong, and how to update/fix the script so that it works correctly for all inputs? I thought Python is able to accept arbitrarily big numbers without a problem …

Thank you!

# Python Code for the Collatz Conjecture
# Rules: Take any integer 'n' and assess:
# If integer is even, divide by 2 (n/2)
# If integer is odd, multiply by 3 and add 1 (3n+1)
# Result: a list of all steps until 'n' goes down to 1

while True:
    print("Please enter a positive integer:")
    n = input("")
    if n == 'q':
        print("Until next time ...n")
        break
    try:
        n = int(n)
        if n > 0:
            i = 0
            while n > 1:
                if n % 2 == 0:
                    n = int(n/2)
                    i += 1
                else:
                    n = int((3*n)+1)
                    i += 1
            print("# of steps to reach '1' = ", str(i), "n")
        else:
            print("Sorry, that's not a valid entry. Please try again!n")
    except ValueError:
        print("Sorry, that's not a valid entry. Please try again!n")
Asked By: ronster1000

||

Answers:

This line:

n = int(n/2)

… converts n to a float, divides that float by 2, then converts back to an int by throwing away the fractional part.

For integers up to 2**52, converting to float is lossless, but for anything larger, it has to round to the nearest 53-bit number, which loses information.

Of course 2 trillion is well under that 2**53 limit for float precision—but the Collatz sequence starting at N frequently goes much, much higher than N. It’s not at all implausible that many numbers around 2 trillion have sequences that go past 2**53, while very few numbers below it do. It’s even possible that a whole long sequence of numbers starting at exactly 2 trillion go past 2**53 but not a single number below it does. But I have no idea how to prove such a thing without building the entire sequence for every number up to 2 trillion. (If there is a proof, it would probably lean heavily on the existing partial proofs of the conjecture under various different conditions, which are above my paygrade…)

Anyway, the solution is simple: you want to use integer division:

n = n // 2

Here’s an example to demonstrate:

>>> n = 2**53 + 3
>>> n
9007199254740995
>>> int(n/2)
4503599627370498
>>> n//2
4503599627370497

To verify that this is actually happening in your code, try this:

def collatz(n):
    overflow = False
    i = 0
    while n > 1:
        if n > 2**53:
            overflow=True
        if n % 2 == 0:
            n = int(n/2)
            i += 1
        else:
            n = int((3*n)+1)
            i += 1
    return i, overflow

if __name__ == '__main__':
    import sys
    for arg in sys.argv[1:]:
        num = int(arg.replace(',', ''))
        result, overflow = collatz(num)
        print(f'{arg:>30}: {result:10,} {overflow}')

When I run this:

$ python3 collatz.py 989,345,275,647 1,122,382,791,663 1,444,338,092,271 1,899,148,184,679 2,081,751,768,559 2,775,669,024,745 3,700,892,032,993 3,743,559,068,799

… it gives me:

           989,345,275,647:      1,348 False
         1,122,382,791,663:      1,356 False
         1,444,338,092,271:      1,408 False
         1,899,148,184,679:      1,411 False
         2,081,751,768,559:        385 True
         2,775,669,024,745:        388 True
         3,700,892,032,993:        391 True
         3,743,559,068,799:        497 True

So, we went past 2**53 in exactly the same cases where we got the wrong answer.

And to verify the fix, change the int(n/2) to n//2:

           989,345,275,647:      1,348 False
         1,122,382,791,663:      1,356 False
         1,444,338,092,271:      1,408 False
         1,899,148,184,679:      1,411 False
         2,081,751,768,559:      1,437 True
         2,775,669,024,745:      1,440 True
         3,700,892,032,993:      1,443 True
         3,743,559,068,799:      1,549 True

So, why is it always off by the same amount?

Well, that’s mostly just a coincidence of the specific numbers you happen to be using.

When you pass 2**53 via 3n+1, you’re going to convert either the last bit, or the last 2 bits, to 0, which means you normally cut off a big part of the chain and replace it with just 1 or 2 divisions. But there are obviously going to be a few numbers where the chain you end up jumping to is longer than the correct one. In fact, it only took me 3 tries to find one: 3,743,559,068,799,123 should take 326 steps, but it takes 370.

I suspect (but again, I can’t even imagine how to prove) that many big numbers are going to end up in that same range around 375, a little shorter as they get (logarithmically) bigger. Why? Well, there’s only so many numbers you can round to—and most of them are probably in cycles with each other you start doing truncating division. So, let’s say that almost every number near 2**53 has a rounding cycle length of a bit over 50, and most numbers in the trillions range reach that 2**53 range in a bit over 300 steps… then most of them are going to end up around 375. (Those numbers are pulled out of thin air, of course, but you could do a Monte Carlo simulation to see how far from reality they actually are…)

Answered By: abarnert