How to cleanup RAM between program iterations using Python

Question:

Problem

I have a Python application inside Docker container. The application receives "jobs" from some queue service (RabbitMQ), does some computing tasks and uploads results into database (MySQL and Redis).

The issue I face is – the RAM is not properly "cleaned up" between iterations and thus memory consumption between iterations raises until OOM. Since I have implemented MemoryError (see tested solutions below for more info), the container stays alive and the memory keeps exhausted (not freed up by container restart).


Question

  • How to debug what is "staying" in the memory so I can clean it up?
  • How to cleanup the memory properly between runs?

Iteration description

An example of increasing memory utilisation; memory limit set to 3000 MiB

  • fresh container: 130 MiB
  • 1st iteration: 1000 MiB
  • 2nd iteration: 1500 MiB
  • 3rd iteration: 1750 MiB
  • 4th iteration: OOM

Note: Every run/iteration is a bit different and thus has a bit different memory requirements, but the pattern stays similar.


Below is a brief overiew of the iteraion which might be helpful while determining what might be wrong

  1. Receiving job parameters from rabbitmq
  2. Loading data from local parquet into dataframe (using read_parquet(filename, engine="fastparquet"))
  3. Computing values using Pandas functions and other libraries (most of the laod is probably here)
  4. Converting dataframe to dictionary and computing some other values inside a loop
  5. Adding some more metrics from computed values – e.g. highest/lowest values, trends etc.
  6. Storing metrics from 5. in database (MySQL and Redis)

A selection of the tech I use

  • Python 3.10
  • Pandas 1.4.4
  • numpy 1.24.2
  • running in AWS ECS Fargate (but results on local are similar); 1 vCPU and 8 GB or memory

Possible solutions / tried approaches

  • ❌: tried; not worked
  • : and idea I am going to test
  • : did not completely solved the problem, but helped towards the solution
  • ✅: working solution

❌ Restart container after every iteration

The most obvious one is to restart the docker container (e.g. using exit() and causing container to restart itself) after every iteration. This solution is not feasible, because the size of "restart overhead" is too big (one run takes 15 – 60 seconds and thus the restart will slow things soo much).

❌ Using gc.collect()

I have tried to call gc.collect() at the very beginning of each iteration, but the memory usage did not change at all.

✅ Test multiprocessing

I read some recommendations to use multiprocessing module in order to improve memory efficiency, because it will "drop" all resources after subprocess finishes.

This solved the issue, see answers below.

https://stackoverflow.com/a/1316799/12193952

Use explicit del on unwanted objects

The idea is to explicitly delete objects that are not longer used (e.g. dataframe after it’s converted to dictionary).

del my_array
del my_object

https://stackoverflow.com/a/1316793/12193952

Monitor memory using psutil

import psutil
# Local imports
from utils import logger


def get_usage():
    total = round(psutil.virtual_memory().total / 1000 / 1000, 4)
    used = round(psutil.virtual_memory().used / 1000 / 1000, 4)
    pct = round(used / total * 100, 1)
    logger.info(f"Current memory usage is: {used} / {total} MB ({pct} %)")

    return True

Support except MemoryError

Thanks to this question I was able to set up try/except pattern that catches OOM errors and keep the container running (so logs are available etc.).


Even if I don’t get any answer, I will continue testing and editing until I find a solution and hopefully help someone else.

Asked By: FN_

||

Answers:

It seems like implementing multiprocessing solved the issue.

Below is the code snipped explaining the implementaion – but it’s very very simple.

import multiprocessing


def callback():
    ...
    # Run the strategy test
    p = multiprocessing.Process(target=run_test, args=(body,))
    p.start()
    p.join()

I was able to mitigate the number of failed tests due to OOM from 86 % to 0 %. Local testing results are following:

  • fresh container: 152 MiB
  • 1st iteration: 162 MiB
  • 2nd iteration: 370 MiB
  • 3rd iteration: 371 MiB
  • 4th iteration: 371 MiB
  • 5th iteration: 371 MiB
  • 6th iteration: 371 MiB
  • 7th iteration: 371 MiB
  • 8th iteration: 371 MiB
Answered By: FN_
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.