Clear all cached kernels from CuPY to force kernel compilation

Question:

In the CuPY documentation, it is stated that

"CuPy caches the kernel code sent to GPU device within the process, which reduces the kernel compilation time on further calls."

This means that when one calls a function from CuPY, subsequent calls to this function will be extremely fast. An example is as follows:

import cupy as cp
from timeit import default_timer as timer
import time

mempool = cp.get_default_memory_pool()
pinned_mempool = cp.get_default_pinned_memory_pool()


def multiply():
    rand = cp.random.default_rng()                             #This is the fast way of creating large arrays with cp
    arr = rand.integers(0, 100_000, (10000, 1000))        #Create array
    y = cp.multiply(arr, 42) ## Multiply by 42, randomly chosen number
    return y

if __name__ == '__main__':
    times = []
    start = timer()
    for i in range(21):
        mempool.free_all_blocks()
        pinned_mempool.free_all_blocks()
        start = timer()
        multiply()
        times.append(timer()-start) 

    print(times)

This will return the times:

[0.17462146899993058, 0.0006819850000283623, 0.0006159440001738403, 0.0006145069999092811, 0.000610309999956371, 0.0006169410000893549, 0.0006062159998236893, 0.0006096620002153941, 0.0006096250001519365, 0.0006106630000886071, 0.0006063629998607212, 0.0006168999998408253, 0.0006058349999875645, 0.0006090080000831222, 0.0005964219999441411, 0.0006113049998930364, 0.0005968339999071759, 0.0005951619998540991, 0.0005980400001135422, 0.0005941219999385794, 0.0006568090000200755]

Where only the first call includes the time it takes to compile the kernel as well.

Is there a way to flush everything in order to force the compilation for each subsequent call to multiply()?

Asked By: JOKKINATOR

||

Answers:

Currently, there is no way to disable kernel caching in CuPy. The only option available is to disable persisting kernel caching on disk (CUPY_CACHE_IN_MEMORY=1), but kernels are cached on-memory so compilation runs only once within the process.

https://docs.cupy.dev/en/stable/user_guide/performance.html#one-time-overheads
https://docs.cupy.dev/en/latest/reference/environment.html

Answered By: kmaehashi
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.