Clearing Tensorflow GPU memory after model execution

Question:

I’ve trained 3 models and am now running code that loads each of the 3 checkpoints in sequence and runs predictions using them. I’m using the GPU.

When the first model is loaded it pre-allocates the entire GPU memory (which I want for working through the first batch of data). But it doesn’t unload memory when it’s finished. When the second model is loaded, using both tf.reset_default_graph() and with tf.Graph().as_default() the GPU memory still is fully consumed from the first model, and the second model is then starved of memory.

Is there a way to resolve this, other than using Python subprocesses or multiprocessing to work around the problem (the only solution I’ve found on via google searches)?

Asked By: David Parks

||

Answers:

GPU memory allocated by tensors is released (back into TensorFlow memory pool) as soon as the tensor is not needed anymore (before the .run call terminates). GPU memory allocated for variables is released when variable containers are destroyed. In case of DirectSession (ie, sess=tf.Session(“”)) it is when session is closed or explicitly reset (added in 62c159ff)

Answered By: Yaroslav Bulatov

A git issue from June 2016 (https://github.com/tensorflow/tensorflow/issues/1727) indicates that there is the following problem:

currently the Allocator in the GPUDevice belongs to the ProcessState,
which is essentially a global singleton. The first session using GPU
initializes it, and frees itself when the process shuts down.

Thus the only workaround would be to use processes and shut them down after the computation.

Example Code:

import tensorflow as tf
import multiprocessing
import numpy as np

def run_tensorflow():

    n_input = 10000
    n_classes = 1000

    # Create model
    def multilayer_perceptron(x, weight):
        # Hidden layer with RELU activation
        layer_1 = tf.matmul(x, weight)
        return layer_1

    # Store layers weight & bias
    weights = tf.Variable(tf.random_normal([n_input, n_classes]))


    x = tf.placeholder("float", [None, n_input])
    y = tf.placeholder("float", [None, n_classes])
    pred = multilayer_perceptron(x, weights)

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
    optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)

    init = tf.global_variables_initializer()

    with tf.Session() as sess:
        sess.run(init)

        for i in range(100):
            batch_x = np.random.rand(10, 10000)
            batch_y = np.random.rand(10, 1000)
            sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})

    print "finished doing stuff with tensorflow!"


if __name__ == "__main__":

    # option 1: execute code with extra process
    p = multiprocessing.Process(target=run_tensorflow)
    p.start()
    p.join()

    # wait until user presses enter key
    raw_input()

    # option 2: just execute the function
    run_tensorflow()

    # wait until user presses enter key
    raw_input()

So if you would call the function run_tensorflow() within a process you created and shut the process down (option 1), the memory is freed. If you just run run_tensorflow() (option 2) the memory is not freed after the function call.

Answered By: Oliver Wilken

Now there seem to be two ways to resolve the iterative training model or if you use future multipleprocess pool to serve the model training, where the process in the pool will not be killed if the future finished. You can apply two methods in the training process to release GPU memory meanwhile you wish to preserve the main process.

  1. call a subprocess to run the model training. when one phase training completed, the subprocess will exit and free memory. It’s easy to get the return value.
  2. call the multiprocessing.Process(p) to run the model training(p.start), and p.join will indicate the process exit and free memory.

Here is a helper function using multiprocess.Process which can open a new process to run your python written function and reture value instead of using Subprocess,

# open a new process to run function
def process_run(func, *args):
    def wrapper_func(queue, *args):
        try:
            logger.info('run with process id: {}'.format(os.getpid()))
            result = func(*args)
            error = None
        except Exception:
            result = None
            ex_type, ex_value, tb = sys.exc_info()
            error = ex_type, ex_value,''.join(traceback.format_tb(tb))
        queue.put((result, error))

    def process(*args):
        queue = Queue()
        p = Process(target = wrapper_func, args = [queue] + list(args))
        p.start()
        result, error = queue.get()
        p.join()
        return result, error  

    result, error = process(*args)
    return result, error
Answered By: liviaerxin

I use numba to release GPU. With TensorFlow, I cannot find an effective method.

import tensorflow as tf
from numba import cuda

a = tf.constant([1.0,2.0,3.0],shape=[3],name='a')
b = tf.constant([1.0,2.0,3.0],shape=[3],name='b')
with tf.device('/gpu:1'):
    c = a+b

TF_CONFIG = tf.ConfigProto(
gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.1),
  allow_soft_placement=True)

sess = tf.Session(config=TF_CONFIG)
sess.run(tf.global_variables_initializer())
i=1
while(i<1000):
        i=i+1
        print(sess.run(c))

sess.close() # if don't use numba,the gpu can't be released
cuda.select_device(1)
cuda.close()
with tf.device('/gpu:1'):
    c = a+b

TF_CONFIG = tf.ConfigProto(
gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.5),
  allow_soft_placement=True)

sess = tf.Session(config=TF_CONFIG)

sess.run(tf.global_variables_initializer())
while(1):
        print(sess.run(c))
Answered By: TanLingxiao

You can use numba library to release all the gpu memory

pip install numba 
from numba import cuda 
device = cuda.get_current_device()
device.reset()

This will release all the memory

Answered By: hitesh kumar

I am figuring out which option is better in the Jupyter Notebook. Jupyter Notebook occupies the GPU memory permanently even a deep learning application is completed. It usually incurs the GPU Fan ERROR that is a big headache. In this condition, I have to reset nvidia_uvm and reboot the linux system regularly. I conclude the following two options can remove the headache of GPU Fan Error but want to know which is better.

Environment:

  • CUDA 11.0
  • cuDNN 8.0.1
  • TensorFlow 2.2
  • Keras 2.4.3
  • Jupyter Notebook 6.0.3
  • Miniconda 4.8.3
  • Ubuntu 18.04 LTS

First Option

Put the following code at the end of the cell. The kernel immediately ended upon the application runtime is completed. But it is not much elegant. Juputer will pop up a message for the died ended kernel.

import os
 
pid = os.getpid()
!kill -9 $pid

Section Option

The following code can also end the kernel with Jupyter Notebook. I do not know whether numba is secure. Nvidia prefers the "0" GPU that is the most used GPU by personal developer (not server guys). However, both Neil G and mradul dubey have had the response: This leaves the GPU in a bad state.

from numba import cuda

cuda.select_device(0)
cuda.close()

It seems that the second option is more elegant. Can some one confirm which is the best choice?

Notes:

It is not such the problem to automatically release the GPU memory in the environment of Anaconda by direct executing "$ python abc.py". However, I sometimes need to use Jyputer Notebook to handle .ipynb application.

Answered By: Mike Chen

I have trained my models in a for loop for different parameters when I got this error after 120 models trained. Afterwards I could not even train a simple model if I did not kill the kernel.
I was able to solve my issue by adding the following line before building the model:

tf.keras.backend.clear_session()

(see https://www.tensorflow.org/api_docs/python/tf/keras/backend/clear_session)

Answered By: Ling

To free my resources, I use:

import os, signal

os.kill(os.getpid(), signal.SIGKILL)
Answered By: trazoM

I was able to solve an OOM error just now with the garbage collector.

import gc
gc.collect()

model.evaluate(x1, y1)
gc.collect()

model.evaluate(x2, y2)
gc.collect()

etc.

Based on what Yaroslav Bulatov said in their answer (that tf deallocates GPU memory when the object is destroyed), I surmised that it could just be that the garbage collector hadn’t run yet. Forcing it to collect freed me up, so that might be a good way to go.

Answered By: Stephen Wight
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.