Tensorflow M2 Pro Failure
Question:
When I run the following test script for tensorflow
import tensorflow as tf
cifar = tf.keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()
model = tf.keras.applications.ResNet50(
include_top=True,
weights=None,
input_shape=(32, 32, 3),
classes=100,)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5, batch_size=4)
I obtain the following terminal output:
Metal device set to: Apple M2 Pro
systemMemory: 16.00 GB
maxCacheSize: 5.33 GB
2023-03-23 00:26:32.203361: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-03-23 00:26:32.203521: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
zsh: bus error python3 app/model/tf_verify.py
Answers:
While unclear from the official Apple documentation, it looks like the tensorflow-macos
version should match the tensorflow-metal
plugin version from the "Releases" section. Since you are using tensorflow-macos==2.9
, you should use tensorflow-metal==0.5.0
and not tensorflow-metal==0.6.0
.
I was able to reproduce and solve the issue on a MacBook Pro M1 Pro and the training works fine.
When I run the following test script for tensorflow
import tensorflow as tf
cifar = tf.keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()
model = tf.keras.applications.ResNet50(
include_top=True,
weights=None,
input_shape=(32, 32, 3),
classes=100,)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5, batch_size=4)
I obtain the following terminal output:
Metal device set to: Apple M2 Pro
systemMemory: 16.00 GB
maxCacheSize: 5.33 GB
2023-03-23 00:26:32.203361: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-03-23 00:26:32.203521: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
zsh: bus error python3 app/model/tf_verify.py
While unclear from the official Apple documentation, it looks like the tensorflow-macos
version should match the tensorflow-metal
plugin version from the "Releases" section. Since you are using tensorflow-macos==2.9
, you should use tensorflow-metal==0.5.0
and not tensorflow-metal==0.6.0
.
I was able to reproduce and solve the issue on a MacBook Pro M1 Pro and the training works fine.