What is the difference between pickle and shelve?

Question:

I am learning about object serialization for the first time. I tried reading and ‘googling’ for differences in the modules pickle and shelve but I am not sure I understand it. When to use which one?
Pickle can turn every python object into stream of bytes which can be persisted into a file. Then why do we need the module shelve? Isn’t pickle faster?

Asked By: zubinmehta

||

Answers:

pickle is for serializing some object (or objects) as a single bytestream in a file.

shelve builds on top of pickle and implements a serialization dictionary where objects are pickled, but associated with a key (some string), so you can load your shelved data file and access your pickled objects via keys. This could be more convenient were you to be serializing many objects.

Here is an example of usage between the two. (should work in latest versions of Python 2.7 and Python 3.x).

pickle Example

import pickle

integers = [1, 2, 3, 4, 5]

with open('pickle-example.p', 'wb') as pfile:
    pickle.dump(integers, pfile)

This will dump the integers list to a binary file called pickle-example.p.

Now try reading the pickled file back.

import pickle

with open('pickle-example.p', 'rb') as pfile:
    integers = pickle.load(pfile)
    print integers

The above should output [1, 2, 3, 4, 5].

shelve Example

import shelve

integers = [1, 2, 3, 4, 5]

# If you're using Python 2.7, import contextlib and use
# the line:
# with contextlib.closing(shelve.open('shelf-example', 'c')) as shelf:
with shelve.open('shelf-example', 'c') as shelf:
    shelf['ints'] = integers

Notice how you add objects to the shelf via dictionary-like access.

Read the object back in with code like the following:

import shelve

# If you're using Python 2.7, import contextlib and use
# the line:
# with contextlib.closing(shelve.open('shelf-example', 'r')) as shelf:
with shelve.open('shelf-example', 'r') as shelf:
    for key in shelf.keys():
        print(repr(key), repr(shelf[key]))

The output will be 'ints', [1, 2, 3, 4, 5].

Answered By: wkl

According to pickle documentation:

Serialization is a more primitive notion than persistence; although pickle reads and writes file objects, it does not handle the issue of naming persistent objects, nor the (even more complicated) issue of concurrent access to persistent objects. The pickle module can transform a complex object into a byte stream and it can transform the byte stream into an object with the same internal structure. Perhaps the most obvious thing to do with these byte streams is to write them onto a file, but it is also conceivable to send them across a network or store them in a database. The shelve module provides a simple interface to pickle and unpickle objects on DBM-style database files.

Answered By: as – if

PROs & CONs

Since nobody really mentioned any


DBM:

  • PROs:
  1. Simple to use: DBM is a basic key-value store and requires minimal setup.
    Fast: DBM provides fast access to data, especially when compared to other disk-based storage options.
  2. Can handle large amounts of data: DBM is able to handle very large datasets, provided you have enough disk space.
  • CONs:
  1. Limited functionality: DBM is a simple key-value store and does not provide advanced functionality such as transactions or multi-process concurrency.
  2. Not well suited for complex data structures: DBM is best suited for storing simple key-value pairs, and may not be the best choice for complex data structures that require multiple values per key or more advanced querying capabilities.

Shelve:

  • PROs:
  1. Rich functionality: Shelve provides a richer API for data access, including the ability to store multiple values per key, support for transactions, and more advanced querying capabilities.
  2. Easy to use: Shelve is a more user-friendly API than DBM, as it provides a dictionary-like interface for data storage and retrieval.
  • CON:

Slower than DBM:

  1. Shelve has a higher overhead compared to DBM and may not be suitable for large datasets or applications with high performance requirements.

  2. May not scale well: Shelve may not be able to handle very large datasets or high concurrency levels, as it can be more prone to locking and other performance issues.

Answered By: ciurlaro