"[CRITICAL] WORKER TIMEOUT" in logs when running "Hello Cloud Run with Python" from GCP Setup Docs

Question

Following the tutorial here I have the following 2 files:

app.py

from flask import Flask, request

app = Flask(__name__)


@app.route('/', methods=['GET'])
def hello():
    """Return a friendly HTTP greeting."""
    who = request.args.get('who', 'World')
    return f'Hello {who}!n'


if __name__ == '__main__':
    # Used when running locally only. When deploying to Cloud Run,
    # a webserver process such as Gunicorn will serve the app.
    app.run(host='localhost', port=8080, debug=True)

Dockerfile

# Use an official lightweight Python image.
# https://hub.docker.com/_/python
FROM python:3.7-slim

# Install production dependencies.
RUN pip install Flask gunicorn

# Copy local code to the container image.
WORKDIR /app
COPY . .

# Service must listen to $PORT environment variable.
# This default value facilitates local development.
ENV PORT 8080

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
CMD exec gunicorn --bind 0.0.0.0:$PORT --workers 1 --threads 8 app:app

I then build and run them using Cloud Build and Cloud Run:

PROJECT_ID=$(gcloud config get-value project)
DOCKER_IMG="gcr.io/$PROJECT_ID/helloworld-python"
gcloud builds submit --tag $DOCKER_IMG
gcloud run deploy --image $DOCKER_IMG --platform managed

The code appears to run fine, and I am able to access the app on the given URL. However the logs seem to indicate a critical error, and the workers keep restarting. Here is the log file from Cloud Run after starting up the app and making a few requests in my web browser:

2020-03-05T03:37:39.392Z Cloud Run CreateService helloworld-python ...
2020-03-05T03:38:03.285477Z[2020-03-05 03:38:03 +0000] [1] [INFO] Starting gunicorn 20.0.4
2020-03-05T03:38:03.287294Z[2020-03-05 03:38:03 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
2020-03-05T03:38:03.287362Z[2020-03-05 03:38:03 +0000] [1] [INFO] Using worker: threads
2020-03-05T03:38:03.318392Z[2020-03-05 03:38:03 +0000] [4] [INFO] Booting worker with pid: 4
2020-03-05T03:38:15.057898Z[2020-03-05 03:38:15 +0000] [1] [INFO] Starting gunicorn 20.0.4
2020-03-05T03:38:15.059571Z[2020-03-05 03:38:15 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
2020-03-05T03:38:15.059609Z[2020-03-05 03:38:15 +0000] [1] [INFO] Using worker: threads
2020-03-05T03:38:15.099443Z[2020-03-05 03:38:15 +0000] [4] [INFO] Booting worker with pid: 4
2020-03-05T03:38:16.320286ZGET200 297 B 2.9 s Safari 13  https://helloworld-python-xhd7w5igiq-ue.a.run.app/
2020-03-05T03:38:16.489044ZGET404 508 B 6 ms Safari 13  https://helloworld-python-xhd7w5igiq-ue.a.run.app/favicon.ico
2020-03-05T03:38:21.575528ZGET200 288 B 6 ms Safari 13  https://helloworld-python-xhd7w5igiq-ue.a.run.app/
2020-03-05T03:38:27.000761ZGET200 285 B 5 ms Safari 13  https://helloworld-python-xhd7w5igiq-ue.a.run.app/?who=me
2020-03-05T03:38:27.347258ZGET404 508 B 13 ms Safari 13  https://helloworld-python-xhd7w5igiq-ue.a.run.app/favicon.ico
2020-03-05T03:38:34.802266Z[2020-03-05 03:38:34 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:4)
2020-03-05T03:38:35.302340Z[2020-03-05 03:38:35 +0000] [4] [INFO] Worker exiting (pid: 4)
2020-03-05T03:38:48.803505Z[2020-03-05 03:38:48 +0000] [5] [INFO] Booting worker with pid: 5
2020-03-05T03:39:10.202062Z[2020-03-05 03:39:09 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:5)
2020-03-05T03:39:10.702339Z[2020-03-05 03:39:10 +0000] [5] [INFO] Worker exiting (pid: 5)
2020-03-05T03:39:18.801194Z[2020-03-05 03:39:18 +0000] [6] [INFO] Booting worker with pid: 6

Note the worker timeouts and reboots at the end of the logs. The fact that its a CRITICAL error makes me think it shouldn’t be happing. Is this expected behavior? Is this a side effect of the Cloud Run machinery starting and stopping my service as requests come and go?

Asked By: jminardi

||

Source

Answer 1

Here’s a working example of a Flask app on Cloud run. My guess is that your last line or the Decker file and the last part of your python file are the ones causing this behavior.

main.py

# main.py
#gcloud beta run services replace service.yaml


from flask import Flask

app = Flask(__name__)

@app.route("/")
def hello_world():

        msg = "Hello World"
    return msg

Dockerfile (the apt-get part is not needed)

# Use the official Python image.
# https://hub.docker.com/_/python
FROM python:3.7

# Install manually all the missing libraries
RUN apt-get update
RUN apt-get install -y gconf-service libasound2 libatk1.0-0 libcairo2 libcups2 libfontconfig1 libgdk-pixbuf2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libxss1 fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils

# Install Python dependencies.
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . .

CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 main:app

then build using:

gcloud builds submit --tag gcr.io/[PROJECT]/[MY_SERVICE]

and deploy:

gcloud beta run deploy [MY_SERVICE] --image gcr.io/[PROJECT]/[MY_SERVICE] --region europe-west1 --platform managed

UPDATE
I’ve checked again the logs you’ve provided.
Getting this kind of warning/error is normal at the beginning after a new deployment as your old instances are not handling any requests but instead they are idle at that time until they are completely shut down.

Gunicorn also has a default timeout of 30s which matches with the time between the time of “Booting worker” and the time you see the error.

Answered By: Waelmas

Answer 2

Cloud Run has scaled down one of your instances, and the gunicorn arbiter is considering it stalled.

You should add --timeout 0 to your gunicorn invocation to disable the worker timeout entirely, it’s unnecessary for Cloud Run.

Answered By: Dustin Ingram

Answer 3

i was facing the error [11229] [CRITICAL] WORKER TIMEOUT (pid:11232) on heroku
i changed my Procfile to this

web: gunicorn --workers=3 app:app --timeout 200 --log-file -

and it fixed my problem by incresing the --timeout

Answered By: Muhammad Zakaria

Answer 4

for those who are entering here and have this problem but with django (probably it will work the same) with gunicorn, supervisor and nginx, check your configuration in the gunicorn_start file or where you have the gunicorn parameters, in my case I have it like this, in the last line add the timeout

NAME="myapp"                                  # Name of the application
DJANGODIR=/webapps/myapp             # Django project directory
SOCKFILE=/webapps/myapp/run/gunicorn.sock  # we will communicte using this unix socket
USER=root                                        # the user to run as
GROUP=root                                     # the group to run as
NUM_WORKERS=3                                     # how many worker processes should Gunicorn spawn
DJANGO_SETTINGS_MODULE=myapp.settings             # which settings file should Django use
DJANGO_WSGI_MODULE=myapp.wsgi                     # WSGI module name

echo "Starting $NAME as `whoami`"

# Activate the virtual environment
cd $DJANGODIR
source ../bin/activate
export DJANGO_SETTINGS_MODULE=$DJANGO_SETTINGS_MODULE
export PYTHONPATH=$DJANGODIR:$PYTHONPATH

# Create the run directory if it doesn't exist
RUNDIR=$(dirname $SOCKFILE)
test -d $RUNDIR || mkdir -p $RUNDIR

# Start your Django Unicorn
# Programs meant to be run under supervisor should not daemonize themselves (do not use --daemon)
exec ../bin/gunicorn ${DJANGO_WSGI_MODULE}:application 
  --name $NAME 
  --workers $NUM_WORKERS 
  --user=$USER --group=$GROUP 
  --bind=unix:$SOCKFILE 
  --log-level=debug 
  --log-file=- 
  --timeout 120 #This

Answered By: Pablo Acosta

Answer 5

I had the same here. My solution: there where a request for the favicon which couldn‘t be served as I didn‘t have a favicon installed. It seems that django (or tailwind which is installed in my case) are creating this request by default and it created an 404 error. After installing a favicon the Problem was solved: workers are stable now and the 404 error is gone.

Answered By: clemens von muu

"[CRITICAL] WORKER TIMEOUT" in logs when running "Hello Cloud Run with Python" from GCP Setup Docs

Question:

app.py

Dockerfile

Answers: