Timeout Tornado requests from point of receiving

Question:

I built a Tornado Webservice with receives and processes requests and Parses PDF Documents. The problem not is that if I want to load the PDF as an xml Tree, this operation takes very long and is CPU blocking.

Now, if I send like 90 Request async, they all arrive at the same time. Some of them get an answer from the server and some of them timeout after 30s, but the server continues to process the requests which are already timed out on the client, this can create a jam on the server after some time.

My solutions were to process the PDFs in a different Thread, but this only made the process time worse because the server received all requests at the same time and tried to process them at the same time which upped the time per request, but this way I could set a Timeout on each Thread.

class PdfExHandler(tornado.web.RequestHandler):
    def post(self):
        future = asyncio.get_event_loop().run_in_executor(None, 
                 self.long_running_func, args)
        await asyncio.wait_for(future, timeout=30, 
              loop=asyncio.get_event_loop())

    def long_running_func(self, args):
        res = processPDF(args)
        self.write(json.dumps(res))

The second solution I tried was setting an async Timer like in this question:

Python – Timer with asyncio/coroutine

class PdfExHandler(tornado.web.RequestHandler):
    async def timeout_callback():
          raise Exception("Timeout")

    def post(self):
        timer = Timer(30, self.timeout_callback)
        res = processPDF(args)
        self.write(json.dumps(res))

But the timer only measures from the start of the PDF process and not from the time of the initial arrival.

Finally I tried this approach:

Right way to "timeout" a Request in Tornado

But that also didn’t work. My question in short: Is there another way to set a Timeout on incoming Requests, or maybe use timed-out Threads without overstrain the CPU?

Edit: I can see in the logs the complete time it took since the request arrived so the RequestHandler starts its own timer.

webservice| 2022-12-02 10:54:34,368 [MainThread] [INFO]  Incoming POST request to PdfEx Service: PDF Nr. 123
webservice| 2022-12-02 10:54:36,920 [MainThread] [INFO]  Completed POST request to PdfEx Service: PDF Nr. 123
webservice| 2022-12-02 10:54:36,934 [MainThread] [INFO]  200 POST /extract (172.18.0.1) 209651.18ms
Asked By: LeM4

||

Answers:

I looked through the code of Tornado and figured out how the get the total request time, which also includes the waiting time. So I just checked if the request time is bigger than my Timeout

class PdfExHandler(tornado.web.RequestHandler):
    def post(self):
      try:
        # request_time() is in seconds
        if self.request.request_time() >= 40:
            raise TimeoutError("Request took to long!")

        res = processPDF(args)
        self.write(json.dumps(res))
      except TimeoutError as ex:
        self.set_status(500)
        self.write(ex.message)
Answered By: LeM4