Django Selective Dumpdata

Question:

Is it possible to selectively filter which records Django’s dumpdata management command outputs? I have a few models, each with millions of rows, and I only want to dump records in one model fitting a specific criteria, as well as all foreign-key linked records referencing any of those records.

Consider this use-case. Say I had a production database where my User model has millions of records. I have several other models (Log, Transaction, Purchase, Bookmarks, etc) all referencing the User model. I want to do development on my Django app, and I want to test using realistic data. However, my production database is so enormous, I can’t realistically take a snapshot of the entire thing and load it locally. So ideally, I’d want to use dumpdata to dump 50 random User records, and all related records to JSON, and use that to populate a development database.

Is there an easy way to accomplish this?

Asked By: Cerin

||

Answers:

I think django-fixture-magic might be worth a look at.

You’ll find some additional background info in Scrubbing your Django database.

Answered By: arie

This isn’t a simple answer to my question, but I found some interesting docs on Django’s built-in natural keys feature, which would allow representing serialized records without the primary key. Unfortunately, it doesn’t look like this is fully integrated into dumpdata, and there’s an old outstanding ticket to fully rely on natural keys.

It also seems the serializers.serialize() function allows serialization of an arbitrary list of specific model instances.

Presumably, if I implemented a natural_key() method on all my models, and then called serializers.serialize([Users.objects.filter(criteria)]), it should come close to accomplishing what I want. I might have to write a function to crawl all the FK references, and include those in the list of objects passed to serialize().

Answered By: Cerin

This snippet might be helpful for you (it follows relationships and serializes them):

http://djangosnippets.org/snippets/918/

You could use also that management command and override the default managers for whichever models you would like to return custom querysets.

Answered By: Phil Avery

This is a very old question, but I recently wrote a custom management command to do just that. It looks very similar to the existing dumpdata command except that it takes some extra arguments to define how I want to filter the querysets and it overrides the get_objects function to perform the actual filtering:

def get_objects(dump_attributes, dump_values):
  qs_1 = ModelClass1.objects.filter(**options["filter_options_for_model_class_1"])    
  qs_2 = ModelClass2.objects.filter(**options["filter_options_for_model_class_2"])    
  # ...repeat for as many different model classes you want to dump...
  yield from chain(qs_1, qs_2, ...)

Answered By: trubliphone

I had the same problem but i didn’t want to add another package and the snippet still didn’t let me to filter my data and i just want a temporary solution

So i thought with my self why not override the default manager apply my filter there, take the dump and then revert my code back. This is of course too hacky and dangerous but in my case made sense.

Yes I had to vim code on live server but you don’t need to reload the server since running command through manage.py would run your current code base so the server from the end-user perspective basically remained on-touched.

from django.db.models import Manager

class DahlBookManager(Manager):
    def get_queryset(self):
        return super().get_queryset().filter(is_edited=False)

class FriendshipQuestion(models.Model):
    objects = DahlBookManager()

and then running the dumpdata command did exactly what i needed which was returning all the unedited questions in my case.

Then I git checkout mymodelfile.py to revert it back to the original.

This by no mean is a good solution but it will get somebody either fired or unstuck.

Answered By: Amir Heshmati

As of Django 3.2, you can use dumpdata to dump a specific app and/or model. For example, for an app named customer:

python manage.py dumpdata customer

or, to dump a model named shoppingcart within the customer app:

python manage.py dumpdata customer.shoppingcart

There are many options with dumpdata, including writing to several output file formats and handling custom managers on models. For example:

python manage.py dumpdata customer --all --indent 4 --output my_fixtures.json

The options:

  • –all: dumps the records even if you use a custom manager on the model
  • –indent : amount to indent when writing to file
  • –output : Send output to a file instead of stdout. Default format is JSON.

See the docs at:
https://docs.djangoproject.com/en/3.2/ref/django-admin/#dumpdata

Answered By: Jesuisme