What is the use case for Django's on_commit?

Question:

Reading this documentation https://docs.djangoproject.com/en/4.0/topics/db/transactions/#django.db.transaction.on_commit

This is the use case for on_commit

with transaction.atomic():  # Outer atomic, start a new transaction
    transaction.on_commit(foo)
    # Do things...

    with transaction.atomic():  # Inner atomic block, create a savepoint
        transaction.on_commit(bar)
        # Do more things...

# foo() and then bar() will be called when leaving the outermost block

But why not just write the code like normal without on_commit hooks? Like this:

with transaction.atomic():  # Outer atomic, start a new transaction
    # Do things...

    with transaction.atomic():  # Inner atomic block, create a savepoint
        # Do more things...

foo()
bar()

# foo() and then bar() will be called when leaving the outermost block

It’s easier to read since it doesn’t require more knowledge of the Django APIs and the statements are put in the order of when they are executed. It’s easier to test since you don’t have to use any special test classes for Django.

So what is the use-case for the on_commit hook?

Asked By: softarn

||

Answers:

Django documentation:

Django provides the on_commit() function to register callback functions that should be executed after a transaction is successfully committed

It is the main purpose. A transaction is a unit of work that you want to treat atomically. It either happens completely or not at all. The same applies to your code. If something went wrong during DB operations you might not need to do some things.

Let’s consider some business logic flow:

  1. User sends his registration data to our endpoint, we validate it, etc.
  2. We save the new user to our DB.
  3. We send him a "hello" letter to email with a link for confirming his account.

If something goes wrong during step 2, we shouldn’t go to step 3.

We can think that, well, I’ll get an exception and wouldn’t execute that code as well. Why do we still need it?

Sometimes you take actions in your code based on an assumption of the transaction being successful before potentially dangerous DB operations. For example, you want firstly to check if can send an email to your user, because you know that your emailing 3rd-party often gives you 500. In that case, you want to raise a 500 for the user and ask him to register later (a very bad idea, btw, but it’s just a synthetic example).

When your function (e.g. with @atomic decorator) contains a lot of DB operations you surely don’t want to memorize all the variables states in order to use them after all DB-related code. Like this:

  • Validation of user’s order.
  • Checking at DB if it could be completed.
  • If it could be done we need to send a request to 3rd-party CRM with the order’s details.
  • If it couldn’t, then we should create a support ticket in another 3rd-party.
  • Saving user’s order to DB, updating user’s model.
  • Sending a messenger notification to the employee who is responsible for the order.
  • Saving information, that notification for employee was sent successfully to the DB.

You can imagine what a mess would we have if we hadn’t on_commit in this situation and we had a really big try-catch on this.

Answered By: Yevgeniy Kosmak

The example code given in the Django docs is transaction.on_commit(lambda: some_celery_task.delay('arg1')) and it’s probably specifically because this comes up a lot with celery tasks.

Imagine if you do the following within a transaction:

my_object = MyObject.objects.create()
some_celery_task.delay(my_object.pk)

Then in your celery task you try doing this:

@app.task
def some_celery_task(object_pk)
    my_object = MyObject.objects.get(pk=object_pk)

This may work a lot of the time, but randomly you’ll get errors where it’s not able to find the object (depending on how fast the work task is run because it’s a race condition). This is because you created a MyObject record within a transaction, but it isn’t actually available in the database until a COMMIT is run. Celery has no access to that open transaction, so it needs to be run after the COMMIT. There’s also the very real possibility that something later on causes a ROLLBACK and that celery task should never actually be called.

So… You need to do:

my_object = MyObject.objects.create()
transaction.on_commit(lambda: some_celery_task.delay(my_object.pk))

Now, the celery task won’t be called until the MyObject has actually been saved to the database after the COMMIT was called.

I should note, though, this is primarily only a concern when you aren’t using AUTOCOMMIT (which is actually the default). If you’re in AUTOCOMMIT mode then you can be certain that a commit has been finished as part of a .create() or .save(). However, if you’re code base has any possibility of being called within a @transaction.atomic() then it’s no longer AUTOCOMMIT and you’re back to needing .on_commit(), so it’s best/safest to always use it.

Answered By: Tim Tisdall
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.