Celery is rerunning long running completed tasks over and over

Question:

I’ve a python celery-redis queue processing uploads and downloads worth gigs and gigs of data at a time.

Few of the uploads takes upto few hours. However once such a task finishes, I’m witnessing this bizarre celery behaviour that the celery scheduler is rerunning the just concluded task again by sending it again to the worker (I’m running a single worker) And it just happened 2times on the same task!

Can someone help me know why is this happening and how can I prevent it?

The tasks are definitely finishing cleanly with no errors reported just that these are extremely long running tasks.

Asked By: user2252999

||

Answers:

I recently ran into this issue, and eventually figured out that tasks were
running multiple times because of a combination of
task prefetching and tasks exceeded the
visibility timeout. Tasks are acknowledged right before they’re executed (unless you set ACKS_LATE=True),
and by default 4 tasks are prefetched per process. The first task will be
acknowledged before execution, but if it takes over an hour to execute then the
other prefetched tasks will be delivered to another worker where it will
be executed an additional time (or in your case,
executed an additional time by the same worker).

You can solve by increasing the visibility timeout to something longer than the longest possible runtime of your tasks:

BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 3600*10}  # 10 hours

You could also set PREFETCH_MULTIPLIER=1 to disable prefetching so that long running tasks don’t keep
other tasks from being acknowledged.

Answered By: Jason V.
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.