Out of memory Error while training Rasa/LaBSE
Question:
I want to train rasa/LaBSE
from the LanguageModelFeaturizer
. I have followed the steps in the docs and did not change the default training data.
My config file looks like:
# The config recipe.
# https://rasa.com/docs/rasa/model-configuration/
recipe: default.v1
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
- name: WhitespaceTokenizer
# - name: RegexFeaturizer
# - name: LexicalSyntacticFeaturizer
- name: LanguageModelFeaturizer
# Name of the language model to use
model_name: "bert"
# Pre-Trained weights to be loaded
model_weights: "rasa/LaBSE"
cache_dir: null
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
constrain_similarities: true
batch_size: 8
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
constrain_similarities: true
- name: FallbackClassifier
threshold: 0.3
ambiguity_threshold: 0.1
After running rasa train
I get:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: failed to allocate memory [Op:AddV2]
I am using a GTX 1660ti
with 6GB memory. My system specifications are:
Rasa
----------------------
rasa 3.0.8
rasa-sdk 3.0.5
System
----------------------
OS: Ubuntu 18.04.6 LTS x86_64
Kernel: 5.4.0-113-generic
CUDA Version: 11.4
Driver Version: 470.57.02
Tensorflow
----------------------
tensorboard 2.8.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.6.1
tensorflow-addons 0.14.0
tensorflow-estimator 2.6.0
tensorflow-hub 0.12.0
tensorflow-probability 0.13.0
tensorflow-text 2.6.0
Regular training works fine and I can run the model. I tried to reduce the batch_size but the error persists.
Answers:
You can create swap memory if your RAM gets full at some point in training.
Running the same code using google colab (Using 16GB GPU memory) works fine. The model uses around 6.5-7GB of memory.
I am assuming OOM is with the diet classifier
Try decreasing some of these parameters. I will list the defaults below
- name: DIETClassifier
epochs: 100
batch_size: [16, 32]
num_transformer_layers: 2
embedding_dimension: 20
hidden_layer_sizes:
text: [256, 128]
...
I want to train rasa/LaBSE
from the LanguageModelFeaturizer
. I have followed the steps in the docs and did not change the default training data.
My config file looks like:
# The config recipe.
# https://rasa.com/docs/rasa/model-configuration/
recipe: default.v1
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
- name: WhitespaceTokenizer
# - name: RegexFeaturizer
# - name: LexicalSyntacticFeaturizer
- name: LanguageModelFeaturizer
# Name of the language model to use
model_name: "bert"
# Pre-Trained weights to be loaded
model_weights: "rasa/LaBSE"
cache_dir: null
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
constrain_similarities: true
batch_size: 8
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
constrain_similarities: true
- name: FallbackClassifier
threshold: 0.3
ambiguity_threshold: 0.1
After running rasa train
I get:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: failed to allocate memory [Op:AddV2]
I am using a GTX 1660ti
with 6GB memory. My system specifications are:
Rasa
----------------------
rasa 3.0.8
rasa-sdk 3.0.5
System
----------------------
OS: Ubuntu 18.04.6 LTS x86_64
Kernel: 5.4.0-113-generic
CUDA Version: 11.4
Driver Version: 470.57.02
Tensorflow
----------------------
tensorboard 2.8.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.6.1
tensorflow-addons 0.14.0
tensorflow-estimator 2.6.0
tensorflow-hub 0.12.0
tensorflow-probability 0.13.0
tensorflow-text 2.6.0
Regular training works fine and I can run the model. I tried to reduce the batch_size but the error persists.
You can create swap memory if your RAM gets full at some point in training.
Running the same code using google colab (Using 16GB GPU memory) works fine. The model uses around 6.5-7GB of memory.
I am assuming OOM is with the diet classifier
Try decreasing some of these parameters. I will list the defaults below
- name: DIETClassifier
epochs: 100
batch_size: [16, 32]
num_transformer_layers: 2
embedding_dimension: 20
hidden_layer_sizes:
text: [256, 128]
...