With Python, can I keep a persistent dictionary and modify it?

Question:

So, I want to store a dictionary in a persistent file. Is there a way to use regular dictionary methods to add, print, or delete entries from the dictionary in that file?

It seems that I would be able to use cPickle to store the dictionary and load it, but I’m not sure where to take it from there.

Asked By: snk

||

Answers:

Unpickle from file when program loads, modify as a normal dictionary in memory while program is running, pickle to file when program exits? Not sure exactly what more you’re asking for here.

Answered By: Amber

If your keys (not necessarily the values) are strings, the shelve standard library module does what you want pretty seamlessly.

Answered By: Alex Martelli

Assuming the keys and values have working implementations of repr, one solution is that you save the string representation of the dictionary (repr(dict)) to file. YOu can load it using the eval function (eval(inputstring)). There are two main disadvantages of this technique:

1) Is will not work with types that have an unuseable implementation of repr (or may even seem to work, but fail). You’ll need to pay at least some attention to what is going on.

2) Your file-load mechanism is basically straight-out executing Python code. Not great for security unless you fully control the input.

It has 1 advantage: Absurdly easy to do.

Answered By: Brian

pickling has one disadvantage. it can be expensive if your dictionary has to be read and written frequently from disk and it’s large. pickle dumps the stuff down (whole). unpickle gets the stuff up (as a whole).

if you have to handle small dicts, pickle is ok. If you are going to work with something more complex, go for berkelydb. It is basically made to store key:value pairs.

Answered By: Stefano Borini

My favorite method (which does not use standard python dictionary functions): Read/write YAML files using PyYaml. See this answer for details, summarized here:

Create a YAML file, “employment.yml”:

new jersey:
  mercer county:
    pumbers: 3
    programmers: 81
  middlesex county:
    salesmen: 62
    programmers: 81
new york:
  queens county:
    plumbers: 9
    salesmen: 36

Step 3: Read it in Python

import yaml
file_handle = open("employment.yml")
my__dictionary = yaml.safe_load(file_handle)
file_handle.close()

and now my__dictionary has all the values. If you needed to do this on the fly, create a string containing YAML and parse it wth yaml.safe_load.

Answered By: Pete

If using only strings as keys (as allowed by the shelve module) is not enough, the FileDict might be a good way to solve this problem.

Answered By: Michael Mauderer

Use JSON

Similar to Pete’s answer, I like using JSON because it maps very well to python data structures and is very readable:

Persisting data is trivial:

>>> import json
>>> db = {'hello': 123, 'foo': [1,2,3,4,5,6], 'bar': {'a': 0, 'b':9}}
>>> fh = open("db.json", 'w')
>>> json.dump(db, fh)

and loading it is about the same:

>>> import json
>>> fh = open("db.json", 'r')
>>> db = json.load(fh)
>>> db
{'hello': 123, 'bar': {'a': 0, 'b': 9}, 'foo': [1, 2, 3, 4, 5, 6]}
>>> del new_db['foo'][3]
>>> new_db['foo']
[1, 2, 3, 5, 6]

In addition, JSON loading doesn’t suffer from the same security issues that shelve and pickle do, although IIRC it is slower than pickle.

If you want to write on every operation:

If you want to save on every operation, you can subclass the Python dict object:

import os
import json

class DictPersistJSON(dict):
    def __init__(self, filename, *args, **kwargs):
        self.filename = filename
        self._load();
        self.update(*args, **kwargs)

    def _load(self):
        if os.path.isfile(self.filename) 
           and os.path.getsize(self.filename) > 0:
            with open(self.filename, 'r') as fh:
                self.update(json.load(fh))

    def _dump(self):
        with open(self.filename, 'w') as fh:
            json.dump(self, fh)

    def __getitem__(self, key):
        return dict.__getitem__(self, key)

    def __setitem__(self, key, val):
        dict.__setitem__(self, key, val)
        self._dump()

    def __repr__(self):
        dictrepr = dict.__repr__(self)
        return '%s(%s)' % (type(self).__name__, dictrepr)

    def update(self, *args, **kwargs):
        for k, v in dict(*args, **kwargs).items():
            self[k] = v
        self._dump()

Which you can use like this:

db = DictPersistJSON("db.json")
db["foo"] = "bar" # Will trigger a write

Which is woefully inefficient, but can get you off the ground quickly.

Answered By: brice

Have you considered using dbm?

import dbm
import pandas as pd
import numpy as np
db = b=dbm.open('mydbm.db','n')

#create some data
df1 = pd.DataFrame(np.random.randint(0, 100, size=(15, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(101,200, size=(10, 3)), columns=list('EFG'))

#serialize the data and put in the the db dictionary
db['df1']=df1.to_json()
db['df2']=df2.to_json()


# in some other process:
db=dbm.open('mydbm.db','r')
df1a = pd.read_json(db['df1'])
df2a = pd.read_json(db['df2'])

This tends to work even without a db.close()

Answered By: Mohrez

I made a module for this https://github.com/tintin10q/persistentdict. I hope this is helpfull.

from persistentdict import persistentdict

with persistentdict("test") as d:
    d["test"] = "test"
    d["test2"] = "test2"

with persistentdict("test") as d:
    print(d["test"]) # test

It is just a context manager that gives you a dict which it fills with the content of the file and then writes back to the file. The specific file with the code is here:

https://github.com/tintin10q/persistentdict/blob/production/persistentdict.py

Answered By: Quinten C