BERT-based NER model giving inconsistent prediction when deserialized

Question:

I am trying to train an NER model using the HuggingFace transformers library on Colab cloud GPUs, pickle it and load the model on my own CPU to make predictions.

Code

The model is the following:

from transformers import BertForTokenClassification

model = BertForTokenClassification.from_pretrained(
    "bert-base-cased",
    num_labels=NUM_LABELS,
    output_attentions = False,
    output_hidden_states = False
)

I am using this snippet to save the model on Colab

import torch

torch.save(model.state_dict(), FILENAME)

Then load it on my local CPU using

# Initiating an instance of the model type

model_reload = BertForTokenClassification.from_pretrained(
    "bert-base-cased",
    num_labels=len(tag2idx),
    output_attentions = False,
    output_hidden_states = False
)

# Loading the model
model_reload.load_state_dict(torch.load(FILENAME, map_location='cpu'))
model_reload.eval()

The code snippet used to tokenize the text and make actual predictions is the same both on the Colab GPU notebook instance and my CPU notebook instance.

Expected Behavior

The GPU-trained model behaves correctly and classifies the following tokens perfectly:

O       [CLS]
O       Good
O       morning
O       ,
O       my
O       name
O       is
B-per   John
I-per   Kennedy
O       and
O       I
O       am
O       working
O       at
B-org   Apple
O       in
O       the
O       headquarters
O       of
B-geo   Cupertino
O       [SEP]

Actual Behavior

When loading the model and use it to make predictions on my CPU, the predictions are totally wrong:

I-eve   [CLS]
I-eve   Good
I-eve   morning
I-eve   ,
I-eve   my
I-eve   name
I-eve   is
I-geo   John
B-eve   Kennedy
I-eve   and
I-eve   I
I-eve   am
I-eve   working
I-eve   at
I-gpe   Apple
I-eve   in
I-eve   the
I-eve   headquarters
I-eve   of
B-org   Cupertino
I-eve   [SEP]

Does anyone have ideas why it doesn’t work? Did I miss something?

Asked By: flyingjapans

||

Answers:

I fixed it, there were two problems:

  1. The index-label mapping for tokens was wrong, for some reason the list() function worked differently on Colab GPU than my CPU (??)

  2. The snippet used to save the model was not correct, for models based on the huggingface-transformers library you can’t use model.save_dict() and load it later, you need to use the save_pretrained() method of your model class, and load it later using from_pretrained().

Answered By: flyingjapans

Other Scenarios

I had the same issue with my custom model initialized with BertModel from the Hugging-Face transformers library. Indeed, I had the first problem mentioned by flyingjapans but with the set() method, which I used to get unique labels and then encode them dynamically.

unique_labels = set([label for label in data["token_labels"].values for label in labels])
label2id = {k: v for v, k in enumerate(unique_labels)}
id2label = {v: k for v, k in enumerate(unique_labels)}

The problem is that at each time I run the notebook, unique_labels contains the labels in a different order compared to the previous notebook session, so I end up with different encoding of the labels.

It should be noted that the problem occurred even when:

  1. Using the standard BertForTokenClassification model from the Hugging-Face transformers library while using save_pretrained() and from_pretrained(). However, It is recommended to save and load the best model using save_pretrained() and from_pretrained() respectively.
  2. Running the notebook on the local host.

So just try to avoid using set() or sort its output before label encoding, thus you always end up with the same label encoding.

Answered By: noured