Set TTL for Supervisor jobs
Question:
I’ve been using Supervisor for over a year now to run 40 jobs on a project.
Today, without any changes to the code or the server, two jobs got stuck, which caused some problems to the services I provide to my customers.
These jobs are very light, written in Python, and they usually process the workload in under 2 minutes.
However, they were stuck for hours.
Inside the code, I can’t see anything that could’ve caused this.
Since I know 5 minutes would be more than enough to run the job, is there a way for me to set a TTL for these jobs?
Answers:
I wasn’t able to find a way to configure this in Supervisor, but I was able to work around it by moving all my code inside a time_limit function as suggested here.
Copying the code here:
import signal
from contextlib import contextmanager
class TimeoutException(Exception): pass
@contextmanager
def time_limit(seconds):
def signal_handler(signum, frame):
raise TimeoutException("Timed out!")
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(seconds)
try:
yield
finally:
signal.alarm(0)
try:
with time_limit(10):
long_function_call()
except TimeoutException as e:
print("Timed out!")
I’ve been using Supervisor for over a year now to run 40 jobs on a project.
Today, without any changes to the code or the server, two jobs got stuck, which caused some problems to the services I provide to my customers.
These jobs are very light, written in Python, and they usually process the workload in under 2 minutes.
However, they were stuck for hours.
Inside the code, I can’t see anything that could’ve caused this.
Since I know 5 minutes would be more than enough to run the job, is there a way for me to set a TTL for these jobs?
I wasn’t able to find a way to configure this in Supervisor, but I was able to work around it by moving all my code inside a time_limit function as suggested here.
Copying the code here:
import signal
from contextlib import contextmanager
class TimeoutException(Exception): pass
@contextmanager
def time_limit(seconds):
def signal_handler(signum, frame):
raise TimeoutException("Timed out!")
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(seconds)
try:
yield
finally:
signal.alarm(0)
try:
with time_limit(10):
long_function_call()
except TimeoutException as e:
print("Timed out!")