How to use GPT-3 for fill-mask tasks?
Question:
I use the following code to get the most likely replacements for a masked word:
!pip install git+https://github.com/huggingface/transformers.git
import torch
import pandas as pd
from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased', top_k=100)
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModelForMaskedLM.from_pretrained('bert-base-uncased')
results = unmasker(f"The sun is [MASK].")
for i in results:
print(i["token_str"], i["score"]*100)
For example, the most likely replacement for "[MASK]" in the sequence "The sun is [MASK]." is "rising" (33.61%), "shining" (9.33%), and "up" (7.38%).
My question: is there a way to achieve the same with GPT-3? There is a "complete" and "insert" preset in the OpenAI playground, however, it gives me full sentences (instead of single words) and no probabilities. Can someone help?
Answers:
First of all, I don’t think you can access properties like token or scores in GPT-3, all you have is the generated text.
Second of all, in my experience GPT-3 is ALL about the correct prompt. You just have to give it instructions like you were talking to a human being.
In you specific case, I would use a prompt like this:
Prompt:
The sun is [MASK].
Replace [MASK] with the most probable 5 words to replace, and give me
their probabilities.
Result:
The sun is shining.
- shining – 0.47
- bright – 0.18
- sunny – 0.13
- hot – 0.10
- beautiful – 0.09
If you want to do that programmatically, here’s the code:
import openai
openai.organization = "your org key, if you have one"
openai.api_key = "you api key"
openai.Engine.list()
my_prompt = '''The sun is [MASK].
Replace [MASK] with the most probable 5 words to replace, and give me their probabilities.'''
# Here set parameters as you like
response = openai.Completion.create(
engine="text-davinci-002",
prompt=my_prompt,
temperature=0,
max_tokens=500,
# top_p=1,
# frequency_penalty=0.0,
# presence_penalty=0.0,
# stop=["n"]
)
print(response['choices'][0]['text'])
I use the following code to get the most likely replacements for a masked word:
!pip install git+https://github.com/huggingface/transformers.git
import torch
import pandas as pd
from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased', top_k=100)
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModelForMaskedLM.from_pretrained('bert-base-uncased')
results = unmasker(f"The sun is [MASK].")
for i in results:
print(i["token_str"], i["score"]*100)
For example, the most likely replacement for "[MASK]" in the sequence "The sun is [MASK]." is "rising" (33.61%), "shining" (9.33%), and "up" (7.38%).
My question: is there a way to achieve the same with GPT-3? There is a "complete" and "insert" preset in the OpenAI playground, however, it gives me full sentences (instead of single words) and no probabilities. Can someone help?
First of all, I don’t think you can access properties like token or scores in GPT-3, all you have is the generated text.
Second of all, in my experience GPT-3 is ALL about the correct prompt. You just have to give it instructions like you were talking to a human being.
In you specific case, I would use a prompt like this:
Prompt:
The sun is [MASK].
Replace [MASK] with the most probable 5 words to replace, and give me
their probabilities.
Result:
The sun is shining.
- shining – 0.47
- bright – 0.18
- sunny – 0.13
- hot – 0.10
- beautiful – 0.09
If you want to do that programmatically, here’s the code:
import openai
openai.organization = "your org key, if you have one"
openai.api_key = "you api key"
openai.Engine.list()
my_prompt = '''The sun is [MASK].
Replace [MASK] with the most probable 5 words to replace, and give me their probabilities.'''
# Here set parameters as you like
response = openai.Completion.create(
engine="text-davinci-002",
prompt=my_prompt,
temperature=0,
max_tokens=500,
# top_p=1,
# frequency_penalty=0.0,
# presence_penalty=0.0,
# stop=["n"]
)
print(response['choices'][0]['text'])