Flask –without-threads gives better performance than –with-threads on CPU-bound tasks?

Question:

I’m using Apache JMeter to test a tiny Flask app. The app performs some sort of CPU-bound task.

Surprisingly, running the Flask app with --without-threads gives noticeably better results than running with --with-threads. How could that be?

Some of the Apache JMeter settings and the respective results are:

Number of Threads (users) Loop Count Time taken without threads (seconds) Time taken with threads (seconds)
5 1000 14 17
10 500 14 18
5 3000 43 51
10 1500 43 56

I’d expect that, in the case of a purely CPU-bound task, the multi-threaded version should be at least as fast as the single-threaded one. Let me explain:

In terms of executing the actual CPU task, I would expect both versions to perform the same. However, in terms of how quickly the next thread can be served, I’d expect the multi-threaded version to have a slight edge, because the request has already been served by Flask and it’s only stuck waiting for the CPU.

In the single-threaded version (i.e. --without-threads), only one request gets served at a time, while all the other requests are waiting to be served by Flask. In other words, there’s a certain "serving overhead" that Flask introduces.

In an ideal world, Flask could serve a new request instantly. In other words, the overhead of Flask serving an HTTP request would be 0. In that case, I would expect the single-threaded and multi-threaded versions to be equally as fast, because it would make no difference whether the threads are waiting to be served by Flask or waiting to get access to the CPU.

I’m guessing that my understanding is incorrect. Where am I wrong?

Asked By: powerful_clouds

||

Answers:

As @Thomas suggested, I ran some more tests using a production-ready server. My server of choice was gunicorn, because it’s easy to set up with Python 3.9.

gunicorn accepts two command-line arguments pertaining to this topic:

  1. --workers – "The number of worker processes for handling requests." The default value is 1.
  2. --threads – "The number of worker threads for handling requests." The default value is also 1.

Increasing --workers up to what my CPU can handle did improve performance. Increasing --threads didn’t. Furthermore, running 8 workers with 1 thread gave better results than running 8 workers with 4 threads.

So, I tried simulating some I/O by sleeping for half a second. Finally, increasing the number of threads did improve performance.

Answered By: powerful_clouds

My application is a CPU bound app too. I used gthread mode gunicorn by --worker-class=gthread and the result is better than another mode. You can try it.
More info: https://medium.com/building-the-system/gunicorn-3-means-of-concurrency-efbb547674b7

Answered By: TanThien