TensorFlow libdevice not found. Why is it not found in the searched path?

Question:

Win 10 64-bit 21H1; TF2.5, CUDA 11 installed in environment (Python 3.9.5 Xeus)

I am not the only one seeing this error; see also (unanswered) here and here.
The issue is obscure and the proposed resolutions are unclear/don’t seem to work (see e.g. here)

Issue Using the TF Linear_Mixed_Effects_Models.ipynb example (download from TensorFlow github here) execution reaches the point of performing the "warm up stage" then throws the error:

InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_one_e_step_2806]

The console contains this output showing that it finds the GPU but XLA initialisation fails to find the – existing! – libdevice in the specified paths

2021-08-01 22:04:36.691300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9623 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-08-01 22:04:37.080007: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2021-08-01 22:04:54.122528: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x1d724940130 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-08-01 22:04:54.127766: I tensorflow/compiler/xla/service/service.cc:177]   StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2021-08-01 22:04:54.215072: W tensorflow/compiler/tf2xla/kernels/random_ops.cc:241] Warning: Using tf.random.uniform with XLA compilation will ignore seeds; consider using tf.random.stateless_uniform instead if reproducible behavior is desired.
2021-08-01 22:04:55.506464: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-01 22:04:55.512876: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-01 22:04:55.517387: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-01 22:04:55.520773: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-01 22:04:55.524125: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2021-08-01 22:04:55.526349: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.

Now the interesting thing is that the paths searched includes "C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin"

the content of that folder includes all the (successfully loaded at TF startup) DLLs, including cudart64_110.dll, dudnn64_8.dll… and of course libdevice.10.bc

Question Since TF says it is searching this location for this file and the file exists there, what is wrong and how do I fix it?

(NB C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2 does not exist… CUDA is intalled in the environment; this path must be a best guess for an OS installation)

Info: I am setting the path by

aPath = '--xla_gpu_cuda_data_dir=C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin'
print(aPath)
os.environ['XLA_FLAGS'] = aPath

but I have also set an OS environment variable XLA_FLAGS to the same string value… I don’t know which one is actually working yet, but the fact that the console output says it searched the intended path is good enough

Asked By: Julian Moore

||

Answers:

The diagnostic information is unclear and thus unhelpful; there is however a resolution

The issue was resolved by providing the file (as a copy) at this path

C:UsersJuliananaconda3envsTF250_PY395_xeusLibrarybinnvvmlibdevice

Note that C:UsersJuliananaconda3envsTF250_PY395_xeusLibrarybin was the path given to XLA_FLAGS, but it seems it is not looking for the libdevice file there it is looking for the nvvmlibdevice path This means that I can’t just set a different value in XLA_FLAGS to point to the actual location of the libdevice file because, to coin a phrase, it’s not (just) the file it’s looking for.

The debug info earlier:

2021-08-05 08:38:52.889213: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-05 08:38:52.896033: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-05 08:38:52.899128: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-05 08:38:52.902510: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-05 08:38:52.905815: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .

is incorrect insofar as there is no "CUDA" in the search path; and FWIW I think a different error should have been given for searching in C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2 since there is no such folder (there’s an old V10.0 folder there, but no OS install of CUDA 11)

Until/unless path handling is improved by TensorFlow such file structure manipulation is needed in every new (Anaconda) python environment.

Full thread in TensorFlow forum here

Answered By: Julian Moore

For linux users, with tensorflow==2.8 add the following environment variable.

XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda-11.4
Answered By: Insectatorious

The following worked for me. With error message:

error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice

Firstly I searched for nvvm directory and then verified that libdevice directory existed:

$ find / -type d -name nvvm 2>/dev/null
/usr/lib/cuda/nvvm
$ cd /usr/lib/cuda/nvvm
/usr/lib/cuda/nvvm$ ls
libdevice
/usr/lib/cuda/nvvm$ cd libdevice
/usr/lib/cuda/nvvm/libdevice$ ls
libdevice.10.bc

Then I exported the environment variable:

export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/lib/cuda

as shown by @Insectatorious above. This solved the error and I was able to run the code.

Answered By: Brendan Darrer

for Windows user

Step-1

run (as administrator)

conda install -c anaconda cudatoolkit

you can specify the cudatoolkit version as per your installed cudaCNN /supported version
ex:conda install -c anaconda cudatoolkit=10.2.89

Step-2

go to the installed conada folder

C:ProgramDataAnaconda3Librarybin

Step-3

locate "libdevice.10.bc" ,copy the file

Step-4

create a folder named "nvvm" inside bin

create another folder named "libdevice" inside nvvm

paste the "libdevice.10.bc" file inside "libdevice"

Step-5

go to environmental variables

System variables >New

variable name:

XLA_FLAGS

variable value:

–xla_gpu_cuda_data_dir=C:ProgramDataAnaconda3Librarybin

(edit above as per your directory)

Step-6
restart the cmd/virtual env

Answered By: Nishikanta Parida

For those using windows and PowerShell, assuming cuda is in C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7

The environment can be set as:

$env:XLA_FLAGS="--xla_gpu_cuda_data_dir='C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7'"

Here "''", i.e. nested quotes, is required!

I think this may be the lightest way to deal with this XLA bug.

Answered By: Little Train

For those using miniconda just copy the file libdevice.10.bc into the root folder of python application or notebook.

It works here using python=3.9, cudatoolkit=11.2, cudnn=8.1.0, and tensorflow==2.9

Answered By: Mauricio Matsumura

i meet the same error with Tensorflow 2.11,CUDA 11.2, cuDNN 8.1.0. because i use conda build the env, so no nvvm directory and no need to export the environment variable and can’t use the command nvcc -V, so many suggestions i searched are not suitable for my problem.
i solve the error by downgrade tensonflow to 2.10.use ‘conda install tensorflow=2.10.0 cudakoolkit cudnn’
reference:https://github.com/tensorflow/tensorflow/issues/58681

Answered By: Min Gao
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.