FastAPI server running on AWS App Runner fails after 24 hours

Question:

I have a FastAPI server configured with Gunicorn, deployed on AWS App Runner. When I try to access the endpoint, it works perfectly, however, after 24 hours, when I try to access the same endpoint, I get a 502 bad gateway error, and nothing is logged on cloudWatch after this point, until I redeploy the application, then it starts working fine again.

I suspect this has to do with my Gunicorn configuration itself which was somehow shutting down my API after some time, and not AWS App Runner, but I have not found any solution. I have also shown my Gunicorn setup below. Any hep will be appreciated.

from fastapi import FastAPI
import uvicorn
from fastapi.middleware.cors import CORSMiddleware
from gunicorn.app.base import BaseApplication
import os
import multiprocessing

api = FastAPI()


def number_of_workers():
    print((multiprocessing.cpu_count() * 2) + 1)
    return (multiprocessing.cpu_count() * 2) + 1


class StandaloneApplication(BaseApplication):
    def __init__(self, app, options=None):
        self.options = options or {}
        self.application = app
        super().__init__()

    def load_config(self):
        config = {
            key: value for key, value in self.options.items()
            if key in self.cfg.settings and value is not None
        }
        for key, value in config.items():
            self.cfg.set(key.lower(), value)

    def load(self):
        return self.application


@api.get("/test")
async def root():
    return 'Success'


if __name__ == "__main__":
    if os.environ.get('APP_ENV') == "development":
        uvicorn.run("api:api", host="0.0.0.0", port=2304, reload=True)

    else:
        options = {
            "bind": "0.0.0.0:2304",
            "workers": number_of_workers(),
            "accesslog": "-",
            "errorlog": "-",
            "worker_class": "uvicorn.workers.UvicornWorker",
            "timeout": "0"
        }

        StandaloneApplication(api, options).run()
Asked By: Stephen Sanwo

||

Answers:

I had the same problem. After a lot of trial and error, two changes seemed to resolve this for me.

  1. Set uvicorn --timeout-keep-alive to 65. For gunicorn this param is --keep-alive. I’m assuming the Application Load Balancer throws 502 if uvicorn closes the tcp socket before ALB does.

  2. Change the App Runner health check to use HTTP rather than TCP ping to manage container recycling. Currently the AWS UI doesn’t allow you to make this change. You will have to do this using aws cli. Use any active URL path for ping check – in your case /test

aws apprunner update-service --service-arn <arn> --health-check-configuration Protocol=HTTP,Path=/test

#2 might just be enough to resolve the issue.

Answered By: Anu Joy