Why do ints require three times as much memory in Python?

Question:

On a 64-bit system an integer in Python takes 24 bytes. This is 3 times the memory that would be needed in e.g. C for a 64-bit integer. Now, I know this is because Python integers are objects. But what is the extra memory used for? I have my guesses, but it would be nice to know for sure.

Asked By: jakobjaderbo

||

Answers:

Remember that the Python int type does not have a limited range like C int has; the only limit is the available memory.

Memory goes to storing the value, the current size of the integer storage (the storage size is variable to support arbitrary sizes), and the standard Python object bookkeeping (a reference to the relevant object and a reference count).

You can look up the longintrepr.h source (the Python 3 int type was traditionally known as the long type in Python 2); it makes effective use of the PyVarObject C type to track integer size:

struct _longobject {
        PyObject_VAR_HEAD
        digit ob_digit[1];
};

The ob_digit array stores ‘digits’ of either 15 or 30 bits wide (depending on your platform); so on my 64-bit OS X system, an integer up to (2 ^ 30) – 1 uses 1 ‘digit’:

>>> sys.getsizeof((1 << 30) - 1)
28

but if you use 2 30-bit digits in the number an additional 4 bytes are needed, etc:

>>> sys.getsizeof(1 << 30)
32
>>> sys.getsizeof(1 << 60)
36
>>> sys.getsizeof(1 << 90)
40

The base 24 bytes then are the PyObject_VAR_HEAD structure, holding the object size, the reference count and the type pointer (each 8 bytes / 64 bits on my 64-bit OS X platform).

On Python 2, integers <= sys.maxint but >= -sys.maxint - 1 are stored using a simpler structure storing just the single value:

typedef struct {
    PyObject_HEAD
    long ob_ival;
} PyIntObject;

because this uses PyObject instead of PyVarObject there is no ob_size field in the struct and the memory size is limited to just 24 bytes; 8 for the long value, 8 for the reference count and 8 for the type object pointer.

Answered By: Martijn Pieters

From longintrepr.h, we see that a Python ‘int’ object is defined with this C structure:

struct _longobject {
        PyObject_VAR_HEAD
        digit ob_digit[1];
};

Digit is a 32-bit unsigned value. The bulk of the space is taken by the variable size object header. From object.h, we can find its definition:

typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;

typedef struct _object {
    _PyObject_HEAD_EXTRA
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
} PyObject;

We can see that we are using a Py_ssize_t, 64-bits assuming 64-bit system, to store the count of “digits” in the value. This is possibly wasteful. We can also see that the general object header has a 64-bit reference count, and a pointer to the object type, which will also be a 64-bits of storage. The reference count is necessary for Python to know when to deallocate the object, and the pointer to the object type is necessary to know that we have an int and not, say, a string, as C structures have no way to test the type of an object from an arbitrary pointer.

_PyObject_HEAD_EXTRA is defined to nothing on most builds of python, but can be used to store a linked list of all Python objects on the heap if the build enables that option, using another two pointers of 64-bits each.

Answered By: mkimball