Unable to scale Gunicorn/Flask HelloWorld over 125 RPS
Question:
I have a Flask app that I have been unable to scale past 125 RPS locally. It is a simple ‘hello world’ as seen below.
I’m using the Locust.io load testing tool. I have pointed the same load test to a local Golang hello world, and am able to get into 1000’s of RPS. IMHO this rules out my Locust and OS configurations as potential bottlenecks.
I’m using 17 workers as my machine has 8 cores ((2*CPU)+1
is recommended by Gunicorn docs)
From what I’ve read, using the gevent
worker type for Gunicorn should allow me to reach 1000’s of RPS, just like with Golang. Is this a correct assumption? or am I missing something critical?
abbreviated code:
app = Flask(__name__)
@app.route('/')
def hello():
return 'hello world!'
Gunicorn conf:
gunicorn -k gevent -w 17 --worker-connections 100000 app:app
Answers:
Answer from authors here: https://github.com/benoitc/gunicorn/issues/305
After another week of debugging, I figured it out! Turns out there is an additional worker type, gevent_pywsgi
. Using this worker type increased the throughout roughly 10x, to levels I would consider acceptable.
My testing showed no difference in performance between the sync
worker and gevent
worker, so I’m still not sure what’s going on there, or what the intent of the gevent
worker type is.
I was also in the same scenario and was using sync
workers (the default worker class) in gunicorn
and the goal was the same, to increase the RPS.
Then I switched to async
workers with the help of gevent
(one of the other options).
The common mistake that we do (I did too) when using gevent with gunicorn is just using it as an argument i.e --worker-class=gevent
.
which makes the whole gunicorn command seem like this…
gunicorn --bind=127.0.0.1:5000 --workers=4 --worker-class=gevent wsgi:application
What we all forget to do is to make changes in the flask code accordingly.
We’ve to modify this
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello():
return 'hello world!'
into this
from gevent import monkey
monkey.patch_all() # monkey patching
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello():
return 'hello world!'
Adding these lines is crucial and you will experience an increment in the RPS.
In my case, I got
~90 RPS
with 20 sync workers + EC2 server (compute optimized) + API hit from local
~430 RPS
with 8 async workers (gevent) + 1 thread per worker + EC2 server (normal) + API hit from local
~600 RPS
with 8 async workers (gevent) + 16 threads per worker + EC2 server (normal) + API hit from local
~900 RPS
with 8 async workers (gevent) + 32 threads per worker + EC2 server (normal) + API hit from local
You can see the drastic 10x increment in RPS using those 2 lines in my case (even though I was using a normal EC2 machine in later tests).
I have a Flask app that I have been unable to scale past 125 RPS locally. It is a simple ‘hello world’ as seen below.
I’m using the Locust.io load testing tool. I have pointed the same load test to a local Golang hello world, and am able to get into 1000’s of RPS. IMHO this rules out my Locust and OS configurations as potential bottlenecks.
I’m using 17 workers as my machine has 8 cores ((2*CPU)+1
is recommended by Gunicorn docs)
From what I’ve read, using the gevent
worker type for Gunicorn should allow me to reach 1000’s of RPS, just like with Golang. Is this a correct assumption? or am I missing something critical?
abbreviated code:
app = Flask(__name__)
@app.route('/')
def hello():
return 'hello world!'
Gunicorn conf:
gunicorn -k gevent -w 17 --worker-connections 100000 app:app
Answer from authors here: https://github.com/benoitc/gunicorn/issues/305
After another week of debugging, I figured it out! Turns out there is an additional worker type, gevent_pywsgi
. Using this worker type increased the throughout roughly 10x, to levels I would consider acceptable.
My testing showed no difference in performance between the sync
worker and gevent
worker, so I’m still not sure what’s going on there, or what the intent of the gevent
worker type is.
I was also in the same scenario and was using sync
workers (the default worker class) in gunicorn
and the goal was the same, to increase the RPS.
Then I switched to async
workers with the help of gevent
(one of the other options).
The common mistake that we do (I did too) when using gevent with gunicorn is just using it as an argument i.e --worker-class=gevent
.
which makes the whole gunicorn command seem like this…
gunicorn --bind=127.0.0.1:5000 --workers=4 --worker-class=gevent wsgi:application
What we all forget to do is to make changes in the flask code accordingly.
We’ve to modify this
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello():
return 'hello world!'
into this
from gevent import monkey
monkey.patch_all() # monkey patching
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello():
return 'hello world!'
Adding these lines is crucial and you will experience an increment in the RPS.
In my case, I got
~90 RPS
with 20 sync workers + EC2 server (compute optimized) + API hit from local
~430 RPS
with 8 async workers (gevent) + 1 thread per worker + EC2 server (normal) + API hit from local
~600 RPS
with 8 async workers (gevent) + 16 threads per worker + EC2 server (normal) + API hit from local
~900 RPS
with 8 async workers (gevent) + 32 threads per worker + EC2 server (normal) + API hit from local
You can see the drastic 10x increment in RPS using those 2 lines in my case (even though I was using a normal EC2 machine in later tests).