Why is the size of 2⁶³ 36 bytes, but 2⁶³-1 is only 24 bytes?

Question:

Everything in Python is an object. So the size of an int in Python will be larger than usual.

>>> sys.getsizeof(int())
24

OK, but why does it take 12 more bytes for 2⁶³ compared too 2⁶³ - 1 and not just one?

>>> sys.getsizeof(2**63)
36
>>> sys.getsizeof(2**62)
24

I get that 2⁶³ is a long and 2⁶³-1 an int, but why 12 bytes of difference?

No more intuitive, I tried some other things:

>>> a = 2**63
>>> a -= 2**62
>>> sys.getsizeof(a)
36

a is still stored as a long even if it could be in an int now. So that’s not surprising. But:

>>> a -= (2**63 - 1)
>>> a = 2**63
>>> a -= (2**63 - 1)
>>> a
1L
>>> sys.getsizeof(a)
28

A new size.

>>> a = 2**63
>>> a -= 2**63
>>> a
0L
>>> sys.getsizeof(a)
24

Back to 24 bytes, but still with a long.

Last thing I got:

>>> sys.getsizeof(long())
24

Question:

How does the memory storage work in those scenarios?

Sub-questions:

Why is there a gap of 12 bytes to add what our intuition tells us is just 1 bit?

Why are int() and long() 24 bytes, but long(1) is already 28 bytes and int(2⁶²)?

NB: Python 3.X is working a bit differently, but not more intuitively. Here I focused on Python 2.7; I did not test on prior versions.

Asked By: T.Nel

||

Answers:

While I didn’t find it in the documentation, here is my explanation.

Python 2 promotes int to long implicitly, when the value exceeds the value that can be stored in int. The size of the new type (long) is the default size of long, which is 32. From now on, the size of your variable, will be determined by its value, which can go up and down.

from sys import getsizeof as size
a = 1
n = 32

# going up
for i in range(10):
    if not i:
        print 'a = %100s%13s%4s' % (str(a), type(a), size(a))
    else:
        print 'a = %100s%14s%3s' % (str(a), type(a), size(a))
    a <<= n

# going down
for i in range(11):
    print 'a = %100s%14s%3s' % (str(a), type(a), size(a))
    a >>= n


a =                                                                                                    1 <type 'int'>  24
a =                                                                                           4294967296 <type 'long'> 32
a =                                                                                 18446744073709551616 <type 'long'> 36
a =                                                                        79228162514264337593543950336 <type 'long'> 40
a =                                                              340282366920938463463374607431768211456 <type 'long'> 44
a =                                                    1461501637330902918203684832716283019655932542976 <type 'long'> 48
a =                                           6277101735386680763835789423207666416102355444464034512896 <type 'long'> 52
a =                                 26959946667150639794667015087019630673637144422540572481103610249216 <type 'long'> 56
a =                       115792089237316195423570985008687907853269984665640564039457584007913129639936 <type 'long'> 60
a =              497323236409786642155382248146820840100456150797347717440463976893159497012533375533056 <type 'long'> 64
a =    2135987035920910082395021706169552114602704522356652769947041607822219725780640550022962086936576 <type 'long'> 68
a =              497323236409786642155382248146820840100456150797347717440463976893159497012533375533056 <type 'long'> 64
a =                       115792089237316195423570985008687907853269984665640564039457584007913129639936 <type 'long'> 60
a =                                 26959946667150639794667015087019630673637144422540572481103610249216 <type 'long'> 56
a =                                           6277101735386680763835789423207666416102355444464034512896 <type 'long'> 52
a =                                                    1461501637330902918203684832716283019655932542976 <type 'long'> 48
a =                                                              340282366920938463463374607431768211456 <type 'long'> 44
a =                                                                        79228162514264337593543950336 <type 'long'> 40
a =                                                                                 18446744073709551616 <type 'long'> 36
a =                                                                                           4294967296 <type 'long'> 32
a =                                                                                                    1 <type 'long'> 28

As you can see, the type stays long after it first became too big for an int, and the initial size was 32, but the size changes with the value (can be higher or lower [or equal, obviously] to 32)

So, to answer your question, the base size is 24 for int, and 28 for long, while long has also the space for saving large values (which starts as 4 bytes – hence 32 bytes for long, but can go up and down according to the value)

As for your sub-question, creating a unique type (with a unique size) for a new number is impossible, so Python has “sub classes” of long type, which deal with a range of numbers, therefore, once you over the limit of your old long you must use the newer, which accounts for much larger numbers too, therefore, it has a few bytes more.

Answered By: CIsForCookies

why does it get 12 more bytes for 2⁶³ compared too 2⁶³ – 1 and not just one?

On an LP64 system1, a Python 2 int consists of exactly three pointer-sized pieces:

  • type pointer
  • reference count
  • actual value, a C long int

That’s 24 bytes in total. On the other hand, a Python long consists of:

  • type pointer
  • reference count
  • digit count, a pointer-sized integer
  • inline array of value digits, each holding 30 bits of value, but stored in 32-bit units (one of the unused bits gets used for efficient carry/borrow during addition and subtraction)

2**63 requires 64 bits to store, so it fits in three 30-bit digits. Since each digit is 4 bytes wide, the whole Python long will take 24+3*4 = 36 bytes.

In other words, the difference comes from long having to separately store the size of the number (8 additional bytes) and from it being slightly less space-efficient about storing the value (12 bytes to store the digits of 2**63). Including the size, the value 2**63 in a long occupies 20 bytes. Comparing that to the 8 bytes occupied by any value of the simple int yields the observed 12-byte difference.

It is worth noting that Python 3 only has one integer type, called int, which is variable-width, and implemented the same way as Python 2 long.


1
64-bit Windows differs in that it retains a 32-bit long int, presumably for source compatibility with a large body of older code that used char, short, and long as “convenient” aliases for 8, 16, and 32-bit values that happened to work on both 16 and 32-bit systems. To get an actual 64-bit type on x86-64 Windows, one must use __int64 or (on newer compiler versions) long long or int64_t. Since Python 2 internally depends on Python int fitting into a C long in various places, sys.maxint remains 2**31-1, even on 64-bit Windows. This quirk is also fixed in Python 3, which has no concept of maxint.

Answered By: user4815162342