How to get token or code embedding using Codex API?
Question:
For a given code snippet, how to get embedding using the Codex API?
import os
import openai
import config
openai.api_key = config.OPENAI_API_KEY
def runSomeCode():
response = openai.Completion.create(
engine="code-davinci-001",
prompt=""""n1. Get a reputable free news apin2. Make a request to the api for the latest news storiesn"""",
temperature=0,
max_tokens=1500,
top_p=1,
frequency_penalty=0,
presence_penalty=0)
if 'choices' in response:
x = response['choices']
if len(x) > 0:
return x[0]['text']
else:
return ''
else:
return ''
answer = runSomeCode()
print(answer)
But I want to figure out given a python code block like the following, can I get the embedding from codex?
Input:
import Random
a = random.randint(1,12)
b = random.randint(1,12)
for i in range(10):
question = "What is "+a+" x "+b+"? "
answer = input(question)
if answer = a*b
print (Well done!)
else:
print("No.")
Output:
- Embedding of the input code
Answers:
Yes, OpenAI can create embedding for any input text — even if it’s code. You only need to pass the correct engine or model in its get_embedding()
function call. I tested out this code:
# Third-party imports
import openai
from openai.embeddings_utils import get_embedding
openai.api_key = OPENAI_SEC_KEY
embedding = get_embedding("""
def sample_code():
print("Hello from IamAshKS !!!")
""", engine="code-search-babbage-code-001")
print()
print(f"{embedding=}")
print(f"{len(embedding)=}")
# OUTPUT:
# embedding=[-0.007094269152730703, 0.006055716425180435, -0.005044757854193449, ...]
# len(embedding)=2048
embedding = get_embedding("""
import Random
a = random.randint(1,12)
b = random.randint(1,12)
for i in range(10):
question = "What is "+a+" x "+b+"? "
answer = input(question)
if answer = a*b
print (Well done!)
else:
print("No.")
""", engine="code-search-babbage-code-001")
print()
print(f"{embedding=}")
print(f"{len(embedding)=}")
# OUTPUT:
# embedding=[-0.011341490782797337, -0.005919027142226696, 0.0011923711281269789, ...]
# len(embedding)=2048
NOTE: You can replace the model or engine using engine
parameter for get_embedding()
.
The above given code gets you embeddings for any code. There is another engine/model for code search named code-search-ada-code-001
but it’s less powerful than code-search-babbage-code-001
, which I used for this answer. If you also want to do code search, go through references below.
References:
The function get_embedding
will give us an embedding for an input text.
Canonical code from OpenAI here: https://github.com/openai/openai-python/blob/main/examples/embeddings/Get_embeddings.ipynb
import openai
from tenacity import retry, wait_random_exponential, stop_after_attempt
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
def get_embedding(text: str, engine="text-similarity-davinci-001") -> List[float]:
# replace newlines, which can negatively affect performance.
text = text.replace("n", " ")
return openai.Embedding.create(input=[text], engine=engine)["data"][0]["embedding"]
embedding = get_embedding("Sample query text goes here", engine="text-search-ada-query-001")
print(len(embedding))
For a given code snippet, how to get embedding using the Codex API?
import os
import openai
import config
openai.api_key = config.OPENAI_API_KEY
def runSomeCode():
response = openai.Completion.create(
engine="code-davinci-001",
prompt=""""n1. Get a reputable free news apin2. Make a request to the api for the latest news storiesn"""",
temperature=0,
max_tokens=1500,
top_p=1,
frequency_penalty=0,
presence_penalty=0)
if 'choices' in response:
x = response['choices']
if len(x) > 0:
return x[0]['text']
else:
return ''
else:
return ''
answer = runSomeCode()
print(answer)
But I want to figure out given a python code block like the following, can I get the embedding from codex?
Input:
import Random
a = random.randint(1,12)
b = random.randint(1,12)
for i in range(10):
question = "What is "+a+" x "+b+"? "
answer = input(question)
if answer = a*b
print (Well done!)
else:
print("No.")
Output:
- Embedding of the input code
Yes, OpenAI can create embedding for any input text — even if it’s code. You only need to pass the correct engine or model in its get_embedding()
function call. I tested out this code:
# Third-party imports
import openai
from openai.embeddings_utils import get_embedding
openai.api_key = OPENAI_SEC_KEY
embedding = get_embedding("""
def sample_code():
print("Hello from IamAshKS !!!")
""", engine="code-search-babbage-code-001")
print()
print(f"{embedding=}")
print(f"{len(embedding)=}")
# OUTPUT:
# embedding=[-0.007094269152730703, 0.006055716425180435, -0.005044757854193449, ...]
# len(embedding)=2048
embedding = get_embedding("""
import Random
a = random.randint(1,12)
b = random.randint(1,12)
for i in range(10):
question = "What is "+a+" x "+b+"? "
answer = input(question)
if answer = a*b
print (Well done!)
else:
print("No.")
""", engine="code-search-babbage-code-001")
print()
print(f"{embedding=}")
print(f"{len(embedding)=}")
# OUTPUT:
# embedding=[-0.011341490782797337, -0.005919027142226696, 0.0011923711281269789, ...]
# len(embedding)=2048
NOTE: You can replace the model or engine using engine
parameter for get_embedding()
.
The above given code gets you embeddings for any code. There is another engine/model for code search named code-search-ada-code-001
but it’s less powerful than code-search-babbage-code-001
, which I used for this answer. If you also want to do code search, go through references below.
References:
The function get_embedding
will give us an embedding for an input text.
Canonical code from OpenAI here: https://github.com/openai/openai-python/blob/main/examples/embeddings/Get_embeddings.ipynb
import openai
from tenacity import retry, wait_random_exponential, stop_after_attempt
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
def get_embedding(text: str, engine="text-similarity-davinci-001") -> List[float]:
# replace newlines, which can negatively affect performance.
text = text.replace("n", " ")
return openai.Embedding.create(input=[text], engine=engine)["data"][0]["embedding"]
embedding = get_embedding("Sample query text goes here", engine="text-search-ada-query-001")
print(len(embedding))