django migrations – workflow with multiple dev branches

Question:

I’m curious how other django developers manage multiple code branches (in git for instance) with migrations.

My problem is as follows:
– we have multiple feature branches in git, some of them with django migrations (some of them altering fields, or removing them altogether)
– when I switch branches (with git checkout some_other_branch) the database does not reflect always the new code, so I run into “random” errors, where a db table column does not exist anymore, etc…

Right now, I simply drop the db and recreate it, but it means I have to recreate a bunch of dummy data to restart work. I can use fixtures, but it requires keeping track of what data goes where, it’s a bit of a hassle.

Is there a good/clean way of dealing with this use-case? I’m thinking a post-checkout git hook script could run the necessary migrations, but I don’t even know if migration rollbacks are at all possible.

Asked By: Laurent S

||

Answers:

Migrations rollback are possible and usually handled automatically by django.

Considering the following model:

class MyModel(models.Model):
    pass
    

If you run python manage.py makemigrations myapp, it will generate the initial migration script.
You can then run python manage.py migrate myapp 0001 to apply this initial migration.

If after that you add a field to your model:

class MyModel(models.Model):    
    my_field = models.CharField()
    

Then regenerate a new migration, and apply it, you can still go back to the initial state. Just run
python manage.py migrate myapp 0001 and the ORM will go backward, removing the new field.

It’s more tricky when you deal with data migrations, because you have to write the forward and backward code.
Considering an empty migration created via python manage.py makemigrations myapp --empty,
you’ll end up with something like:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

from django.db import models, migrations

def forward(apps, schema_editor):
    # load some data
    MyModel = apps.get_model('myapp', 'MyModel')
    
    while condition:
        instance = MyModel()
        instance.save()
    
def backward(apps, schema_editor):
    # delete previously loaded data
    MyModel = apps.get_model('myapp', 'MyModel')
    
    while condition:
        instance = MyModel.objects.get(myargs)
        instance.delete()

class Migration(migrations.Migration):

    dependencies = [
        ('myapp', '0003_auto_20150918_1153'),
    ]

    operations = [ 
        migrations.RunPython(forward, backward),
    ]
    

For pure data-loading migrations, you usually don’t need the backward migration.
But when you alter the schema and update existing rows
(like converting all values in a column to slug), you’ll generally have to write the backward step.

In our team, we try to avoid working on the same models at the same time to avoid collision.
If it is not possible, and two migration with the same number (e.g 0002) are created,
you can still rename one of them to change the order in which they will be applied (also remember to update
the dependencies attribute on the migration class to your new order).

If you end up working on the same model fields at the same time in different features,
you’ll still be in trouble, but it may mean these features are related and should be handled
together in a single branch.

For the git-hooks part, it’s probably possible to write something, Assuming your are on branch mybranch
and want to check out another feature branch myfeature:

  1. Just before switching, you dump the list of currently applied migrations into
    a temporary file mybranch_database_state.txt
  2. Then, you apply myfeature branch migrations, if any
  3. Then, when checking back mybranch, you reapply your previous database state
    by looking to the dump file.

However, it seems a bit hackish to me, and it would probably be really difficult to handle properly all scenarios:
rebasing, merging, cherry-picking, etc.

Handling the migrations conflicts when they occurs seems easier to me.

Answered By: Agate

I don’t have a good solution to this, but I feel the pain.

A post-checkout hook will be too late. If you are on branch A and you check out branch B, and B has fewer migrations than A, the rollback information is only in A and needs to be run before checkout.

I hit this problem when jumping between several commits trying to locate the origin of a bug. Our database (even in development trim) is huge, so dropping and recreating isn’t practical.

I’m imagining a wrapper for git-checkout that:

  1. Notes the newest migration for each of your INSTALLED_APPS
  2. Looks in the requested branch and notes the newest migrations there
  3. For each app where the migrations in #1 are farther ahead than in #2, migrate back to the highest migration in #2
  4. Check out the new branch
  5. For each app where migrations in #2 were ahead of #1, migrate forward

A simple matter of programming!

Answered By: Paul Bissex

For simple changes I rely on migration rollback, as discussed by Agate.

However, if I know a feature branch is going to involve highly invasive database changes, or if it will involve a lot of data migration, I like to create a clone of the local (or remote dev) database as soon as I start the new branch. This may not always be convenient, but especially for local development using sqlite it is just a matter op copying a file (which is not under source control).

The first commit on the new branch then updates my Django settings (local/dev) to use the cloned database. This way, when I switch branches, the correct database is selected automatically. No need to worry about rolling back schema changes, missing data, etc. No complicated stuff.

After the feature branch has been fully merged, the cloned database can be removed.

Answered By: djvg

So far I have found two Github projects (django-south-compass and django_nomad) that try to solve the issue of migrating between dev branches and there is a couple of answers on Stack Overflow.

Citing an article on Medium, most of the solutions boil down to one of the following concepts:

  1. Dropping all the tables and reapply migrations in the target branch from scratch. When the tables are created from scratch, all the data will be lost and needs to be recreated as well. This can be handled with fixtures and data migrations but managing them, in turn, will become a nightmare, not to mention that it will take some time (…)
  2. Have a separate database for each branch and change the settings file with the target branch’s settings every time the branch is switched using tools like sed. This can be done with a post_checkout hook. Maintaining one large database for each branch would be very storage-intensive. Also, checking out individual commit IDs might potentially produce the same errors.
  3. Finding the differences in migrations between the source and target branch, and apply the differences. We can do so with post_checkout script but there is a small issue. This post explains the issue in detail. To summarize the issue, post_checkout is run after all the files in the target branch are checked out, which includes migration files. If the target branch doesn’t contain all the migrations in the source branch when we run python manage.py migrate app1 Django won’t be able to find the missing migrations which are needed to apply reverse migrations. We have to temporarily checkout migration files in the source branch, run python manage.py migrate and checkout migration files in the target branch. django-south-compass does something very similar but is available only for up to python 2.6.
  4. Using a management command (which uses python git module), find all the migration operations differences between the source branch and the merge-base of the source branch and target branch and notify the user of these changes. If these changes don’t interfere with the reason for branch change, the user can go ahead and change the branch. Else, using another management command, un-apply all migration till merge base, switch branch, and apply the migrations in the target branch. There will be a small data loss and if the two branches haven’t diverged a lot, is manageable. django_nomad does some of this work.
  5. Keep a track of applied and unapplied migrations in files and use this data to populate the tables when switching branches.
Answered By: Lukasz Czerwinski
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.