Unable to scale Gunicorn/Flask HelloWorld over 125 RPS

Question:

I have a Flask app that I have been unable to scale past 125 RPS locally. It is a simple ‘hello world’ as seen below.

I’m using the Locust.io load testing tool. I have pointed the same load test to a local Golang hello world, and am able to get into 1000’s of RPS. IMHO this rules out my Locust and OS configurations as potential bottlenecks.

I’m using 17 workers as my machine has 8 cores ((2*CPU)+1 is recommended by Gunicorn docs)

From what I’ve read, using the gevent worker type for Gunicorn should allow me to reach 1000’s of RPS, just like with Golang. Is this a correct assumption? or am I missing something critical?

abbreviated code:

app = Flask(__name__)

@app.route('/')
def hello():
    return 'hello world!'

Gunicorn conf:

gunicorn -k gevent -w 17  --worker-connections 100000 app:app

Locust load test results. Each ‘user’ GETs ‘/’ once per 4s
enter image description here

enter image description here

Asked By: CptJero

||

Answers:

Answer from authors here: https://github.com/benoitc/gunicorn/issues/305

After another week of debugging, I figured it out! Turns out there is an additional worker type, gevent_pywsgi. Using this worker type increased the throughout roughly 10x, to levels I would consider acceptable.

My testing showed no difference in performance between the sync worker and gevent worker, so I’m still not sure what’s going on there, or what the intent of the gevent worker type is.

Answered By: CptJero

I was also in the same scenario and was using sync workers (the default worker class) in gunicorn and the goal was the same, to increase the RPS.

Then I switched to async workers with the help of gevent (one of the other options).

The common mistake that we do (I did too) when using gevent with gunicorn is just using it as an argument i.e --worker-class=gevent.

which makes the whole gunicorn command seem like this…

gunicorn --bind=127.0.0.1:5000 --workers=4 --worker-class=gevent wsgi:application

What we all forget to do is to make changes in the flask code accordingly.

We’ve to modify this

from flask import Flask

app = Flask(__name__)
    
    @app.route('/')
    def hello():
        return 'hello world!'

into this

from gevent import monkey
monkey.patch_all() # monkey patching

from flask import Flask

app = Flask(__name__)
        
@app.route('/')
def hello():
    return 'hello world!'

Adding these lines is crucial and you will experience an increment in the RPS.

In my case, I got

~90 RPS with 20 sync workers + EC2 server (compute optimized) + API hit from local

~430 RPS with 8 async workers (gevent) + 1 thread per worker + EC2 server (normal) + API hit from local

~600 RPS with 8 async workers (gevent) + 16 threads per worker + EC2 server (normal) + API hit from local

~900 RPS with 8 async workers (gevent) + 32 threads per worker + EC2 server (normal) + API hit from local

You can see the drastic 10x increment in RPS using those 2 lines in my case (even though I was using a normal EC2 machine in later tests).

Answered By: Garvit