Could not identify NUMA node of platform GPU

Question:

I try to get Tensorflow to start on my machine, but I always get stuck with a “Could not identify NUMA node” error message.

I use a Conda environment:

  • tensorflow-gpu 1.12.0
  • cudatoolkit 9.0
  • cudnn 7.1.2
  • nvidia-smi says: Driver Version 418.43, CUDA Version 10.1

Here is the error code:

>>> import tensorflow as tf
>>> tf.Session()
2019-04-04 09:56:59.851321: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-04 09:56:59.950066: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2019-04-04 09:56:59.950762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.0845
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.84GiB
2019-04-04 09:56:59.950794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-04-04 09:59:45.338767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-04 09:59:45.338799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-04-04 09:59:45.338810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-04-04 09:59:45.339017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1193] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Unfortunately, I have no idea what to do with the error code.

Asked By: Käseknacker

||

Answers:

I could fix it with a new conda enviroment:

conda create --name tf python=3
conda activate tf
conda install cudatoolkit=9.0 tensorflow-gpu=1.11.0

A table of compatible CUDA/TF combinations is available here.
In my case, the combination of cudatoolkit=9.0 and tensorflow-gpu=1.12, inexplicably led to an std::bad_alloc error.
However, cudatoolkit=9.0 and tensorflow-gpu=1.11.0 works fine.

Answered By: Käseknacker

I had the same issue and I finally found out that it is because you used Adam to optimize the model. Once you use another optimizer it should work.

Answered By: user18016319

If you are getting this error on mac machine and error message includes this line Metal device set to: Apple M1 or any other chip than uninstall tensorflow-metal will resolve error.

pip uninstall tensorflow-metal

Answered By: Jay Parekh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.