MemoryError in Jupyter but not in Python

Question:

I’m running

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"

with plenty of memory

             total        used        free      shared  buff/cache   available
Mem:           125G        3.3G        104G        879M         17G        120G

64 bit Anaconda https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh

I have set max_buffer_size to 64GB in both jupyter_notebook_config.json and
jupyter_notebook_config.py, and just to make sure specify it on the command line:

jupyter notebook --certfile=ssl/mycert.perm --keyfile ssl/mykey.key --no-browser --NotebookApp.max_buffer_size=64000000000

And also

cat /proc/sys/vm/overcommit_memory
1

I run a simple memory allocation snippet:

    size = int(6e9)
    chunk = size * ['r']
    print (chunk.__sizeof__()/1e9)

as a standalone .py file and it works:

python ../readgzip.py
48.00000004

happily reporting that it allocated 48GB for my list.

However, the same code in a jupyter notebook only works up to 7.76GB:

    size = int(9.7e8)
    chunk = size * ['r']
    print (chunk.__sizeof__()/1e9)
7.76000004

and fails after increase the array size from 9.7e8 to 9.75e8

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
/tmp/ipykernel_12328/3436837519.py in <module>
      1 size = int(9.75e8)
----> 2 chunk = size * ['r']
      3 print (chunk.__sizeof__()/1e9)

MemoryError: 

Also, on my home Windows11 machine with 64GB of memory I can easily run the code above and allocate 32GB of memory.

Seems like, I’m missing something about the Jupyter setup on Linux

What am I missing?

Thank you

Asked By: David Makovoz

||

Answers:

On linux (and possibly other OSs, but I’m not sure), MemoryError doesn’t mean that the machine’s memory has exhausted (in which case OOM killer would usually be invoked), but rather that the process has reached a limit (A.K.A ulimit, obsolete) over which the kernel is not willing to allocate it with additional memory.

You can use python’s resource library to check the current process’s limits (and possibly set them, having sufficient permissions). Here’s an example:

$ prlimit --as
RESOURCE DESCRIPTION              SOFT      HARD UNITS
AS       address space limit unlimited unlimited bytes
$ prlimit --pid=$$ --as=$((1024*1024*20)):
prlimit --as
RESOURCE DESCRIPTION             SOFT      HARD UNITS
AS       address space limit 20971520 unlimited bytes
$ python
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import resource
>>> resource.getrlimit(resource.RLIMIT_AS)
(20971520, -1)
>>> longstr = "r" * 1024*1024*10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
>>> longstr = "r" * 1024*1024*3
>>> resource.setrlimit(resource.RLIMIT_AS, (1024*1024*30, resource.RLIM_INFINITY))
>>> resource.getrlimit(resource.RLIMIT_AS)
(31457280, -1)
>>> longstr = "r" * 1024*1024*10
>>> len(longstr)
10485760
>>>

Increasing Jupyter-Notebook’s limits should be done from outside the process itself, as it is generally not recommended running python processes with superuser privileges.

Read more about linux’s prlimit(1) utility and the getrlimit(2) system call.

Answered By: micromoses