Celery auto reload on ANY changes
Question:
I could make celery reload itself automatically when there is changes on modules in CELERY_IMPORTS
in settings.py
.
I tried to give mother modules to detect changes even on child modules but it did not detect changes in child modules. That make me understand that detecting is not done recursively by celery. I searched it in the documentation but I did not meet any response for my problem.
It is really bothering me to add everything related celery part of my project to CELERY_IMPORTS
to detect changes.
Is there a way to tell celery that “auto reload yourself when there is any changes in anywhere of project”.
Thank You!
Answers:
You can manually include additional modules with -I|--include
. Combine this with GNU tools like find
and awk
and you’ll be able to find all .py
files and include them.
$ celery -A app worker --autoreload --include=$(find . -name "*.py" -type f | awk '{sub("./",""); gsub("/", "."); sub(".py",""); print}' ORS=',' | sed 's/.$//')
Lets explain it:
find . -name "*.py" -type f
find
searches recursively for all files containing .py
. The output looks something like this:
./app.py
./some_package/foopy
./some_package/bar.py
Then:
awk '{sub("./",""); gsub("/", "."); sub(".py",""); print}' ORS=','
This line takes output of find
as input and removes all occurences of ./
. Then it replaces all /
with a .
. The last sub()
removes replaces .py
with an empty string. ORS
replaces all newlines with ,
. This outputs:
app,some_package.foo,some_package.bar,
The last command, sed
removes the last ,
.
So the command that is being executed looks like:
$ celery -A app worker --autoreload --include=app,some_package.foo,some_package.bar
If you have a virtualenv
inside your source you can exclude it by adding -path .path_to_your_env -prune -o
:
$ celery -A app worker --autoreload --include=$(find . -path .path_to_your_env -prune -o -name "*.py" -type f | awk '{sub("./",""); gsub("/", "."); sub(".py",""); print}' ORS=',' | sed 's/.$//')
OrangeTux’s solution didn’t work out for me, so I wrote a little Python script to achieve more or less the same. It monitors file changes using inotify, and triggers a celery restart if it detects a IN_MODIFY
, IN_ATTRIB
, or IN_DELETE
.
#!/usr/bin/env python
"""Runs a celery worker, and reloads on a file change. Run as ./run_celery [directory]. If
directory is not given, default to cwd."""
import os
import sys
import signal
import time
import multiprocessing
import subprocess
import threading
import inotify.adapters
CELERY_CMD = tuple("celery -A amcat.amcatcelery worker -l info -Q amcat".split())
CHANGE_EVENTS = ("IN_MODIFY", "IN_ATTRIB", "IN_DELETE")
WATCH_EXTENSIONS = (".py",)
def watch_tree(stop, path, event):
"""
@type stop: multiprocessing.Event
@type event: multiprocessing.Event
"""
path = os.path.abspath(path)
for e in inotify.adapters.InotifyTree(path).event_gen():
if stop.is_set():
break
if e is not None:
_, attrs, path, filename = e
if filename is None:
continue
if any(filename.endswith(ename) for ename in WATCH_EXTENSIONS):
continue
if any(ename in attrs for ename in CHANGE_EVENTS):
event.set()
class Watcher(threading.Thread):
def __init__(self, path):
super(Watcher, self).__init__()
self.celery = subprocess.Popen(CELERY_CMD)
self.stop_event_wtree = multiprocessing.Event()
self.event_triggered_wtree = multiprocessing.Event()
self.wtree = multiprocessing.Process(target=watch_tree, args=(self.stop_event_wtree, path, self.event_triggered_wtree))
self.wtree.start()
self.running = True
def run(self):
while self.running:
if self.event_triggered_wtree.is_set():
self.event_triggered_wtree.clear()
self.restart_celery()
time.sleep(1)
def join(self, timeout=None):
self.running = False
self.stop_event_wtree.set()
self.celery.terminate()
self.wtree.join()
self.celery.wait()
super(Watcher, self).join(timeout=timeout)
def restart_celery(self):
self.celery.terminate()
self.celery.wait()
self.celery = subprocess.Popen(CELERY_CMD)
if __name__ == '__main__':
watcher = Watcher(sys.argv[1] if len(sys.argv) > 1 else ".")
watcher.start()
signal.signal(signal.SIGINT, lambda signal, frame: watcher.join())
signal.pause()
You should probably change CELERY_CMD
, or any other global variables.
Celery --autoreload
doesn’t work and it is deprecated.
Since you are using django, you can write a management command for that.
Django has autoreload utility which is used by runserver to restart WSGI server when code changes.
The same functionality can be used to reload celery workers. Create a seperate management command called celery. Write a function to kill existing worker and start a new worker. Now hook this function to autoreload as follows.
import shlex
import subprocess
from django.core.management.base import BaseCommand
from django.utils import autoreload
def restart_celery():
cmd = 'pkill celery'
subprocess.call(shlex.split(cmd))
cmd = 'celery worker -l info -A foo'
subprocess.call(shlex.split(cmd))
class Command(BaseCommand):
def handle(self, *args, **options):
print('Starting celery worker with autoreload...')
# For Django>=2.2
autoreload.run_with_reloader(restart_celery)
# For django<2.1
# autoreload.main(restart_celery)
Now you can run celery worker with python manage.py celery
which will autoreload when codebase changes.
This is only for development purposes and do not use it in production. Code taken from my other answer here.
This is the way I made it work in Django:
# worker_dev.py (put it next to manage.py)
from django.utils import autoreload
def run_celery():
from projectname import celery_app
celery_app.worker_main(["-Aprojectname", "-linfo", "-Psolo"])
print("Starting celery worker with autoreload...")
autoreload.run_with_reloader(run_celery)
Then run python worker_dev.py
. This has an advantage of working inside docker container.
You can use watchmedo
pip install watchdog
Start celery worker indirectly via watchmedo
watchmedo auto-restart --directory=./ --pattern=*.py --recursive -- celery worker --app=worker.app --concurrency=1 --loglevel=INFO
I used watchdog
watchdemo
utility, it works great but for some reason the PyCharm debugger was not able to debug the subprocess spawned by watchdemo
.
So if your project has werkzeug
as dependency, you can use the werkzeug._reloader.run_with_reloader
function to autoreload celery worker on code change. Plus it works with PyCharm debugger.
"""
Filename: celery_dev.py
"""
import sys
from werkzeug._reloader import run_with_reloader
# this is the celery app path in my application, change it according to your project
from web.app import celery_app
def run():
# create copy of "argv" and remove script name
argv = sys.argv.copy()
argv.pop(0)
# start the celery worker
celery_app.worker_main(argv)
if __name__ == '__main__':
run_with_reloader(run)
Sample PyCharm debug configuration.
NOTE:
This is a private werkzeug
API and is working as of Werkzeug==2.0.3
. It may stop working in future versions. Use at you own risk.
This is a huge adaptation from Suor’s code.
I made a custom Django command which can be called like this:
python manage.py runcelery
So, every time the code changes, celery’s main process is gracefully killed and then executed again.
Change the CELERY_COMMAND
variable as you wish.
# File: runcelery.py
import os
import signal
import subprocess
import time
import psutil
from django.core.management.base import BaseCommand
from django.utils import autoreload
DELAY_UNTIL_START = 5.0
CELERY_COMMAND = 'celery --config my_project.celeryconfig worker --loglevel=INFO'
class Command(BaseCommand):
help = ''
def kill_celery(self, parent_pid):
os.kill(parent_pid, signal.SIGTERM)
def run_celery(self):
time.sleep(DELAY_UNTIL_START)
subprocess.run(CELERY_COMMAND.split(' '))
def get_main_process(self):
for process in psutil.process_iter():
if process.ppid() == 0: # PID 0 has no parent
continue
parent = psutil.Process(process.ppid())
if process.name() == 'celery' and parent.name() == 'celery':
return parent
return
def reload_celery(self):
parent = self.get_main_process()
if parent is not None:
self.stdout.write('[*] Killing Celery process gracefully..')
self.kill_celery(parent.pid)
self.stdout.write('[*] Starting Celery...')
self.run_celery()
def handle(self, *args, **options):
autoreload.run_with_reloader(self.reload_celery)
There was an issue in @AlexTT answer, I don’t know if I should comment on his answer of put this as an answer.
You can use watchmedo
pip install watchdog
Start celery worker indirectly via watchmedo
watchmedo auto-restart --directory=./ --pattern=*.py --recursive -- celery -A <app> worker --concurrency=1 --loglevel=INFO
I could make celery reload itself automatically when there is changes on modules in CELERY_IMPORTS
in settings.py
.
I tried to give mother modules to detect changes even on child modules but it did not detect changes in child modules. That make me understand that detecting is not done recursively by celery. I searched it in the documentation but I did not meet any response for my problem.
It is really bothering me to add everything related celery part of my project to CELERY_IMPORTS
to detect changes.
Is there a way to tell celery that “auto reload yourself when there is any changes in anywhere of project”.
Thank You!
You can manually include additional modules with -I|--include
. Combine this with GNU tools like find
and awk
and you’ll be able to find all .py
files and include them.
$ celery -A app worker --autoreload --include=$(find . -name "*.py" -type f | awk '{sub("./",""); gsub("/", "."); sub(".py",""); print}' ORS=',' | sed 's/.$//')
Lets explain it:
find . -name "*.py" -type f
find
searches recursively for all files containing .py
. The output looks something like this:
./app.py
./some_package/foopy
./some_package/bar.py
Then:
awk '{sub("./",""); gsub("/", "."); sub(".py",""); print}' ORS=','
This line takes output of find
as input and removes all occurences of ./
. Then it replaces all /
with a .
. The last sub()
removes replaces .py
with an empty string. ORS
replaces all newlines with ,
. This outputs:
app,some_package.foo,some_package.bar,
The last command, sed
removes the last ,
.
So the command that is being executed looks like:
$ celery -A app worker --autoreload --include=app,some_package.foo,some_package.bar
If you have a virtualenv
inside your source you can exclude it by adding -path .path_to_your_env -prune -o
:
$ celery -A app worker --autoreload --include=$(find . -path .path_to_your_env -prune -o -name "*.py" -type f | awk '{sub("./",""); gsub("/", "."); sub(".py",""); print}' ORS=',' | sed 's/.$//')
OrangeTux’s solution didn’t work out for me, so I wrote a little Python script to achieve more or less the same. It monitors file changes using inotify, and triggers a celery restart if it detects a IN_MODIFY
, IN_ATTRIB
, or IN_DELETE
.
#!/usr/bin/env python
"""Runs a celery worker, and reloads on a file change. Run as ./run_celery [directory]. If
directory is not given, default to cwd."""
import os
import sys
import signal
import time
import multiprocessing
import subprocess
import threading
import inotify.adapters
CELERY_CMD = tuple("celery -A amcat.amcatcelery worker -l info -Q amcat".split())
CHANGE_EVENTS = ("IN_MODIFY", "IN_ATTRIB", "IN_DELETE")
WATCH_EXTENSIONS = (".py",)
def watch_tree(stop, path, event):
"""
@type stop: multiprocessing.Event
@type event: multiprocessing.Event
"""
path = os.path.abspath(path)
for e in inotify.adapters.InotifyTree(path).event_gen():
if stop.is_set():
break
if e is not None:
_, attrs, path, filename = e
if filename is None:
continue
if any(filename.endswith(ename) for ename in WATCH_EXTENSIONS):
continue
if any(ename in attrs for ename in CHANGE_EVENTS):
event.set()
class Watcher(threading.Thread):
def __init__(self, path):
super(Watcher, self).__init__()
self.celery = subprocess.Popen(CELERY_CMD)
self.stop_event_wtree = multiprocessing.Event()
self.event_triggered_wtree = multiprocessing.Event()
self.wtree = multiprocessing.Process(target=watch_tree, args=(self.stop_event_wtree, path, self.event_triggered_wtree))
self.wtree.start()
self.running = True
def run(self):
while self.running:
if self.event_triggered_wtree.is_set():
self.event_triggered_wtree.clear()
self.restart_celery()
time.sleep(1)
def join(self, timeout=None):
self.running = False
self.stop_event_wtree.set()
self.celery.terminate()
self.wtree.join()
self.celery.wait()
super(Watcher, self).join(timeout=timeout)
def restart_celery(self):
self.celery.terminate()
self.celery.wait()
self.celery = subprocess.Popen(CELERY_CMD)
if __name__ == '__main__':
watcher = Watcher(sys.argv[1] if len(sys.argv) > 1 else ".")
watcher.start()
signal.signal(signal.SIGINT, lambda signal, frame: watcher.join())
signal.pause()
You should probably change CELERY_CMD
, or any other global variables.
Celery --autoreload
doesn’t work and it is deprecated.
Since you are using django, you can write a management command for that.
Django has autoreload utility which is used by runserver to restart WSGI server when code changes.
The same functionality can be used to reload celery workers. Create a seperate management command called celery. Write a function to kill existing worker and start a new worker. Now hook this function to autoreload as follows.
import shlex
import subprocess
from django.core.management.base import BaseCommand
from django.utils import autoreload
def restart_celery():
cmd = 'pkill celery'
subprocess.call(shlex.split(cmd))
cmd = 'celery worker -l info -A foo'
subprocess.call(shlex.split(cmd))
class Command(BaseCommand):
def handle(self, *args, **options):
print('Starting celery worker with autoreload...')
# For Django>=2.2
autoreload.run_with_reloader(restart_celery)
# For django<2.1
# autoreload.main(restart_celery)
Now you can run celery worker with python manage.py celery
which will autoreload when codebase changes.
This is only for development purposes and do not use it in production. Code taken from my other answer here.
This is the way I made it work in Django:
# worker_dev.py (put it next to manage.py)
from django.utils import autoreload
def run_celery():
from projectname import celery_app
celery_app.worker_main(["-Aprojectname", "-linfo", "-Psolo"])
print("Starting celery worker with autoreload...")
autoreload.run_with_reloader(run_celery)
Then run python worker_dev.py
. This has an advantage of working inside docker container.
You can use watchmedo
pip install watchdog
Start celery worker indirectly via watchmedo
watchmedo auto-restart --directory=./ --pattern=*.py --recursive -- celery worker --app=worker.app --concurrency=1 --loglevel=INFO
I used watchdog
watchdemo
utility, it works great but for some reason the PyCharm debugger was not able to debug the subprocess spawned by watchdemo
.
So if your project has werkzeug
as dependency, you can use the werkzeug._reloader.run_with_reloader
function to autoreload celery worker on code change. Plus it works with PyCharm debugger.
"""
Filename: celery_dev.py
"""
import sys
from werkzeug._reloader import run_with_reloader
# this is the celery app path in my application, change it according to your project
from web.app import celery_app
def run():
# create copy of "argv" and remove script name
argv = sys.argv.copy()
argv.pop(0)
# start the celery worker
celery_app.worker_main(argv)
if __name__ == '__main__':
run_with_reloader(run)
Sample PyCharm debug configuration.
NOTE:
This is a private werkzeug
API and is working as of Werkzeug==2.0.3
. It may stop working in future versions. Use at you own risk.
This is a huge adaptation from Suor’s code.
I made a custom Django command which can be called like this:
python manage.py runcelery
So, every time the code changes, celery’s main process is gracefully killed and then executed again.
Change the CELERY_COMMAND
variable as you wish.
# File: runcelery.py
import os
import signal
import subprocess
import time
import psutil
from django.core.management.base import BaseCommand
from django.utils import autoreload
DELAY_UNTIL_START = 5.0
CELERY_COMMAND = 'celery --config my_project.celeryconfig worker --loglevel=INFO'
class Command(BaseCommand):
help = ''
def kill_celery(self, parent_pid):
os.kill(parent_pid, signal.SIGTERM)
def run_celery(self):
time.sleep(DELAY_UNTIL_START)
subprocess.run(CELERY_COMMAND.split(' '))
def get_main_process(self):
for process in psutil.process_iter():
if process.ppid() == 0: # PID 0 has no parent
continue
parent = psutil.Process(process.ppid())
if process.name() == 'celery' and parent.name() == 'celery':
return parent
return
def reload_celery(self):
parent = self.get_main_process()
if parent is not None:
self.stdout.write('[*] Killing Celery process gracefully..')
self.kill_celery(parent.pid)
self.stdout.write('[*] Starting Celery...')
self.run_celery()
def handle(self, *args, **options):
autoreload.run_with_reloader(self.reload_celery)
There was an issue in @AlexTT answer, I don’t know if I should comment on his answer of put this as an answer.
You can use watchmedo
pip install watchdog
Start celery worker indirectly via watchmedo
watchmedo auto-restart --directory=./ --pattern=*.py --recursive -- celery -A <app> worker --concurrency=1 --loglevel=INFO