Extracting Features from BertForSequenceClassification

Question:

Hello together currently I´m trying to develop a model for contradicition detection. Using and fine-tuning a BERT Model I already got quite statisfactionary result but I think with with some other features I could get a better accuracy. I oriented myself on this Tutorial. After fine-tuning, my model looks like this:

==== Embedding Layer ====

bert.embeddings.word_embeddings.weight                  (30000, 768)
bert.embeddings.position_embeddings.weight                (512, 768)
bert.embeddings.token_type_embeddings.weight                (2, 768)
bert.embeddings.LayerNorm.weight                              (768,)
bert.embeddings.LayerNorm.bias                                (768,)

==== First Transformer ====

bert.encoder.layer.0.attention.self.query.weight          (768, 768)
bert.encoder.layer.0.attention.self.query.bias                (768,)
bert.encoder.layer.0.attention.self.key.weight            (768, 768)
bert.encoder.layer.0.attention.self.key.bias                  (768,)
bert.encoder.layer.0.attention.self.value.weight          (768, 768)
bert.encoder.layer.0.attention.self.value.bias                (768,)
bert.encoder.layer.0.attention.output.dense.weight        (768, 768)
bert.encoder.layer.0.attention.output.dense.bias              (768,)
bert.encoder.layer.0.attention.output.LayerNorm.weight        (768,)
bert.encoder.layer.0.attention.output.LayerNorm.bias          (768,)
bert.encoder.layer.0.intermediate.dense.weight           (3072, 768)
bert.encoder.layer.0.intermediate.dense.bias                 (3072,)
bert.encoder.layer.0.output.dense.weight                 (768, 3072)
bert.encoder.layer.0.output.dense.bias                        (768,)
bert.encoder.layer.0.output.LayerNorm.weight                  (768,)
bert.encoder.layer.0.output.LayerNorm.bias                    (768,)

==== Output Layer ====

bert.pooler.dense.weight                                  (768, 768)
bert.pooler.dense.bias                                        (768,)
classifier.weight                                           (2, 768)
classifier.bias                                                 (2,)

My next step would be to get the [CLS] token from this model, combine it with a few hand crafted features and feed them into a different model (MLP) for classfification. Any hints how to do this?

Asked By: Klaas

||

Answers:

You can use the pooling output (contextualized embedding of the [CLS] token fed to the pooling layers) of the BERT model:

from transformers import BertModel, BertTokenizer

#replace bert-base-uncased with the path to your saved model
t = BertTokenizer.from_pretrained('bert-base-uncased')
m = BertModel.from_pretrained('bert-base-uncased')


i = t.batch_encode_plus(['this is a sample', 'different sample'], padding=True,return_tensors='pt')
o = m(**i)

print(o.keys())
#shape [batch_size, 768]
print(o.pooler_output.shape)
useMe = o.pooler_output
Answered By: cronoik