How to bypass the TTL of 30 seconds while waiting response of post request from external server

Question:

I am building a view in Django , which will send a POST request to chat-gpt API, the problem that I am facing that the response from chat-gpt is taking more than 30 seconds (we are having a long prompts)
the idea that I have in mind is:

  1. The client sends a request to the server.
  2. The server writes the request to a message queue and returns a
    message ID to the client.
  3. Another worker is listening to the message queue. It retrieves the
    request from the queue, sends it to OpenAI,and then writes the
    response back to the message queue.
  4. The client periodically sends requests to the server to ask for the response using the previously received message ID.
  5. The server responds with "pending" until it finds the response in the message queue, at which point it returns the actual response to the client.

the problem is that I have no idea how to achieve that …
I am using gke for hosting the application, I already have some cronjob using some views as well
any idea how to deal with this will be so much appreciated
here is an example of the view request:

import openai
from app.forms_prompt import PromptForm
from app.models import ModelName
from django.conf import settings
from django.contrib.auth.decorators import login_required
from django.http import HttpRequest
from django.http import HttpResponse
from django.shortcuts import get_object_or_404
from django.shortcuts import redirect
from django.shortcuts import render
from django.utils.translation import gettext_lazy as _


@login_required
def form_prompt(request: HttpRequest, pk: int) -> HttpResponse:
    instance = get_object_or_404(ModelName, pk=pk)
    openai.api_key = settings.OPENAI_KEY
    form = PromptForm(request.POST or None, instance=setkeyword)
    # check if form data is valid
    if form.is_valid():
        prompt = form.cleaned_data["text"]
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[
                {"role": "user", "content": prompt},
            ],
        )
        instance.specific_field = response["choices"][0]["message"]["content"]
        form.save()
        return redirect("view_instance_name", instance.pk)
    return render(request, "view_prompt_name.html", context)

any suggestion how to follow the solution, will be very helpful, thank you

Asked By: ladhari

||

Answers:

the response from chat-gpt is taking more than 30 seconds (we are having a long prompts)

That means you need an asynchronous processing, which is a common pattern in distributed systems to handle long running tasks or slow external dependencies.

Your current setup is:

Client --(HTTP POST)--> Django View --(API Call)--> External Service (GPT API)

To implement your idea, you might consider Celery for task processing, RabbitMQ as the message broker, and Django for handling HTTP requests.

Client ------(HTTP POST)------> Django View ------(Message)------> RabbitMQ
                                      |                               |
                                      |<----------(Polling)-----------|
                                      |                               |
                                      |-----------(Task)------------->| Celery -----(API Call)----> External Service
                                      |                       b       |                              |
                                      |<----------(Response)----------|<-----------------------------|

Install Celery in your Django project and set up RabbitMQ as the message broker (bundle: pip install celery[redis]), and configure it to use RabbitMQ as the message broker: create a new file named celery.py in your main Django app directory:

from __future__ import absolute_import, unicode_literals
import os
from celery import Celery

# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'your_project.settings')

app = Celery('your_project')

app.config_from_object('django.conf:settings', namespace='CELERY')

# Load task modules from all registered Django app configs.
app.autodiscover_tasks()

Then create a Celery task to handle the GPT API call:

# tasks.py in one of your apps
import openai
from django.conf import settings
from celery import shared_task

@shared_task
def get_gpt_response(prompt, message_id):
    openai.api_key = settings.OPENAI_KEY
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
    )
    # Store the response in your database against the message_id
    # rest of your code

Modify your Django view to enqueue the task and return a message ID to the client.

from app.tasks import get_gpt_response
from uuid import uuid4
# other imports/code

@login_required
def form_prompt(request: HttpRequest, pk: int) -> HttpResponse:
    instance = get_object_or_404(ModelName, pk=pk)
    form = PromptForm(request.POST or None, instance=setkeyword)
    if form.is_valid():
        prompt = form.cleaned_data["text"]
        message_id = str(uuid4())  # Generate a unique message ID
        get_gpt_response.delay(prompt, message_id)  # Enqueue the task
        # Store the message ID and initial status in your database
        return JsonResponse({'message_id': message_id})  # Return the message ID to the client
    return render(request, "view_prompt_name.html", context)

Create a new Django view to handle polling requests from the client, using JsonResponse .

from django.http import JsonResponse
# other import/code

def check_status(request: HttpRequest, message_id: str) -> HttpResponse:
    # Retrieve the status and possibly the response from your database using the message_id
    # 
    # If response is ready, return it to the client
    # else return a 'pending' status
    # 

Now, the TTL of 30 seconds should no longer be a problem, since the client will keep polling the server for the response, and the server will respond with the actual result once it is ready.

See also:


When arrived to the get_gpt_response.delay(prompt, message_id) # Enqueue the task, I got this message, error: kombu.exceptions.OperationalError: [Errno 111] Connection refused I am wondering if I correctly initiated celery.

The error kombu.exceptions.OperationalError: [Errno 111] Connection refused you are encountering is typically associated with Celery being unable to connect to the message broker (like RabbitMQ or Redis). That could be due to several reasons such as the broker service not running, incorrect broker URL, or network issues.

The solution on GitHub celery/kombu issue 1582 mentions adds the broker and backend settings to the Celery configuration, and makes sure the Redis server is running.
You may need to adapt your Celery configuration similarly.

import os
from celery import Celery
from django.conf import settings

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'your_project.settings')

# Use Redis as the broker if it exists
app = Celery('your_project', broker="redis://localhost:6379", backend="redis://localhost:6379")

app.config_from_object('django.conf:settings')

# Load task modules from all registered Django app configs.
app.autodiscover_tasks(settings.INSTALLED_APPS)

BROKER_URL = "redis://localhost:6379"
broker_url = "redis://localhost:6379"
app.conf.broker_url = BROKER_URL
CELERY_BROKER_URL = BROKER_URL

app.conf.beat_schedule = {
    # any periodic tasks you have
}

@app.task
def debug_task():
    print(f'Request: {self.request!r}')
Answered By: VonC

In order to decoupling the reception of the message from the client and the actual processing you have different options. In fact, the one you outlined is quite good.

Conceptually, you could try something like the following.

Upon reception of a new HTTP request from your client, the server will provide that client some type of unique identifier, ideally in a 202, Accepted response. This unique identifier will be used later by the client to retrieve the actual result.

In addition, instead of invoking directly the OpenAI API, your server could publish a new message in a Pub/Sub topic. The required SDK is provided for different programming languages, Python among them.

The published message must include the unique identifier returned to the client and all the necessary information to perform the ChatGCP computation.

For example:

import uuid
from app.forms_prompt import PromptForm
from app.models import ModelName
from django.conf import settings
from django.contrib.auth.decorators import login_required
from django.http import HttpRequest
from django.http import HttpResponse
from django.shortcuts import get_object_or_404
from django.shortcuts import redirect
from django.shortcuts import render
from django.utils.translation import gettext_lazy as _
from google.cloud import pubsub_v1

# You must provide the right project and topic values
project_id = "your-project-id"
topic_id = "your-topic-id"

publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(project_id, topic_id)


@login_required
def form_prompt(request: HttpRequest, pk: int) -> HttpResponse:
    instance = get_object_or_404(ModelName, pk=pk)
    form = PromptForm(request.POST or None, instance=setkeyword)
    # check if form data is valid
    if form.is_valid():
        prompt = form.cleaned_data["text"]
        request_id = str(uuid.uuid4())
        message = {
            "request_id": request_id,
            "prompt": prompt
        }

        data = json.dumps(message).encode("utf-8")
        future = publisher.publish(topic_path, data)
        # future.result will return the message id generated by 
        # Pub/Sub, unique within your topic
        print(future.result())

        form.save()
        return render(request, "view_computing_prompt", context={"request_id": request_id}, status=202)
    return render(request, "view_prompt_name.html", context)

To perform this actual OpenAI API call you can define a cloud function triggered when new messages arrive to the configured Pub/Sub topic.

This Cloud Function should store the result of the computation performed by ChatGPT, when ready, in some type of storage. This storage should accesible to the server as well: it is necessary to check if the result is available when the client ask for it.

The actual storage mechanism in which the ChatGCP result should be stored will depend on your actual implementation.

For example, you could use a relational database table and insert a record with the unique identifier returned by the client and the actual result provided.

Or something such as Cloud Memorystore, keying your content by the unique identifier provided initially to the client.

Or you can use a Cloud Storage bucket and create and entry with name equals to the unique identifier provided to the client and, as content, the result provided by the ChatGPT API: it could be a cost effective solution.

Consider the following code as a reference of the function implementation, based on your own code and the provided Cloud Function Pub/Sub GCP example

import openai
import base64

from cloudevents.http import CloudEvent
import functions_framework


def subscribe(cloud_event: CloudEvent) -> None:
    message = json.loads(base64.b64decode(cloud_event.data["message"]["data"]).decode())
    prompt = message['prompt']
    # consider configuring a secret for storing the OpenAI API key
    # https://cloud.google.com/functions/docs/configuring/secrets
    openai.api_key = os.environ.get("OPENAI_KEY")
    
    response = openai.ChatCompletion.create(
               model="gpt-4",
               messages=[
                  {"role": "user", "content": prompt},
               ],
    )

    content = response["choices"][0]["message"]["content"]

    request_id = message['request_id']

    # as mentioned, storage the message in some place accesible 
    # to both the function and the django server
    # see, for example, https://cloud.google.com/storage/docs/uploading-objects-from-memory
    # ...
    

Please, do not forget to configure the appropriate permissions for the different products described above for both the server and the cloud function as required.

Answered By: jccampanero