Pytorch with CUDA local installation fails on Ubuntu

Question

I am trying to install PyTorch with CUDA.
I followed the instructions (installation using conda) mentioned in
https://pytorch.org/get-started/locally/

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c
pytorch

The conda install command runs without giving any error:

conda list displays the following:

# Name                    Version                   Build  Channel

cudatoolkit               11.3.1               h2bc3f7f_2
pytorch                   1.11.0          py3.9_cuda11.3_cudnn8.2.0_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
torch                     1.10.2                   pypi_0    pypi
torchaudio                0.11.0               py39_cu113    pytorch
torchvision               0.11.3                   pypi_0    pypi

But when I check whether GPU driver and CUDA is enabled and accessible by PyTorch

torch.cuda.is_available()

returns false.

Prior to Pytorch installation, I checked and confirmed the pre-requisites mentioned in

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions

Here are my ubuntu server details:

Environment:

OS/kernel:

Ubuntu 18.04.6 LTS (GNU/Linux 4.15.0-154-generic x86_64)

Footnote under the table: Table 1. Native Linux Distribution Support in CUDA 11.6
mentions

For Ubuntu LTS on x86-64, the Server LTS kernel (e.g. 4.15.x for
18.04) is supported in CUDA 11.6.

GCC

gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

GLIBC

ldd (Ubuntu GLIBC 2.27-3ubuntu1.5) 2.27

GPU

GeForce GTX 1080 Ti

Kernel headers and development packages

$ uname -r
4.15.0-176-generic

As per my understanding, conda pytorch installation with CUDA will install the CUDA driver too.

I am not sure where did I went wrong.
Thanks in advance.

EDIT:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

nvcc shows CUDA version 9.1

whereas

$ nvidia-smi
Wed May 11 06:44:31 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:05:00.0 Off |                  N/A |
| 25%   40C    P8    11W / 250W |     18MiB / 11177MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:06:00.0 Off |                  N/A |
| 25%   40C    P8    11W / 250W |      2MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:09:00.0 Off |                  N/A |
| 25%   35C    P8    11W / 250W |      2MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      4119      G   /usr/lib/xorg/Xorg                             9MiB |
|    0      4238      G   /usr/bin/gnome-shell                           6MiB |
+-----------------------------------------------------------------------------+

nvidia-smi shows CUDA version 10.0

https://varhowto.com/check-cuda-version/
This article mentions that nvcc refers to CUDA-toolkit whereas nvidia-smi refers to NVIDIA driver.

Q1: Does it shows that there are two different CUDA installation at the system wide level?

Nvidia Cudatoolkit vs Conda Cudatoolkit
The CUDA toolkit (version 11.3.1) I am installing in my conda environment is different from the one installed as system wide level (which is shown by the output of nvcc and nvidia-smi).

Q2: As per the above stackoverflow thread answer, they can be separate. Or is it the reason for my failure to install cuda locally?

Asked By: Kaushik Acharya

||

Source

Answer 1

Is the Nvidia driver correctly installed ? Type nvida-smi to validate that, this issue may be caused by the mismatch between the driver version and cudatoolkit version.

Answered By: Florin

Answer 2

I have solved the issue.

Disclaimer: I am a newbie in CUDA.
Following answer is based on a) what I have read in other threads b) my experience based on those discussions.

Core Logic:
CUDA driver’s version >= CUDA runtime version

Reference: Different CUDA versions shown by nvcc and NVIDIA-smi

In most cases, if nvidia-smi reports a CUDA version that is
numerically equal to or higher than the one reported by nvcc -V, this
is not a cause for concern. That is a defined compatibility path in
CUDA (newer drivers/driver API support "older" CUDA toolkits/runtime
API).

As I am using conda’s cudatoolkit:

Driver API: nvidia-smi
Runtime API: conda’s cudatoolkit

For cudatoolkit 11.3.1, I was using nvidia-smi CUDA Version: 10.0

Solution:
Upgrade NVIDIA drivers.

Upgraded the NVIDIA drivers following the instruction at https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-ubuntu-18-04-bionic-beaver-linux

Post upgradation, here’s the output of nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:05:00.0 Off |                  N/A |
| 27%   46C    P8    12W / 250W |     19MiB / 11177MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:06:00.0 Off |                  N/A |
| 25%   44C    P8    11W / 250W |      2MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:09:00.0 Off |                  N/A |
| 25%   39C    P8    11W / 250W |      2MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3636      G   /usr/lib/xorg/Xorg                  9MiB |
|    0   N/A  N/A      4263      G   /usr/bin/gnome-shell                6MiB |
+-----------------------------------------------------------------------------+

Now driver version(11.4) >= runtime version (11.3.1)

PyTorch is now able to use CUDA with GPU:

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: True

Answered By: Kaushik Acharya

Pytorch with CUDA local installation fails on Ubuntu

Question:

Answers: