Why can a 352GB NumPy ndarray be used on an 8GB memory macOS computer?

Question:

import numpy as np

array = np.zeros((210000, 210000)) # default numpy.float64
array.nbytes

When I run the above code on my 8GB memory MacBook with macOS, no error occurs. But running the same code on a 16GB memory PC with Windows 10, or a 12GB memory Ubuntu laptop, or even on a 128GB memory Linux supercomputer, the Python interpreter will raise a MemoryError. All the test environments have 64-bit Python 3.6 or 3.7 installed.

Asked By: Blaise Wang

||

Answers:

@Martijn Pieters’ answer is on the right track, but not quite right: this has nothing to do with memory compression, but instead it has to do with virtual memory.

For example, try running the following code on your machine:

arrays = [np.zeros((21000, 21000)) for _ in range(0, 10000)]

This code allocates 32TiB of memory, but you won’t get an error (at least I didn’t, on Linux). If I check htop, I see the following:

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
31362 user       20   0 32.1T 69216 12712 S  0.0  0.4  0:00.22 python

This because the OS is perfectly willing to overcommit on virtual memory. It won’t actually assign pages to physical memory until it needs to. The way it works is:

  • calloc asks the OS for some memory to use
  • the OS looks in the process’s page tables, and finds a chunk of memory that it’s willing to assign. This is fast operation, the OS just stores the memory address range in an internal data structure.
  • the program writes to one of the addresses.
  • the OS receives a page fault, at which point it looks and actually assigns the page to physical memory. A page is usually a few KiB in size.
  • the OS passes control back to the program, which proceeds without noticing the interruption.

Creating a single huge array doesn’t work on Linux because, by default, a “heuristic algorithm is applied to figure out if enough memory is available”. (thanks @Martijn Pieters!) Some experiments on my system show that for me, the kernel is unwilling to provide more than 0x3BAFFFFFF bytes. However, if I run echo 1 | sudo tee /proc/sys/vm/overcommit_memory, and then try the program in the OP again, it works fine.

For fun, try running arrays = [np.ones((21000, 21000)) for _ in range(0, 10000)]. You’ll definitely get an out of memory error, even on MacOs or Linux with swap compression. Yes, certain OSes can compress RAM, but they can’t compress it to the level that you wouldn’t run out of memory.

Answered By: user60561
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.