How to store and load a Python dictionary with HDF5

Question:

I’m having issues loading (I think storing is working – a file is being created and contains data) a dictionary (string key and array/list value) from a HDF5 file. I’m receiving the following error:

ValueError: malformed node or string: < HDF5 dataset “dataset_1”: shape (), type “|O” >

My code is:

import h5py

def store_table(self, filename):
    table = dict()
    table['test'] = list(np.zeros(7,dtype=int))

    with h5py.File(filename, "w") as file:
        file.create_dataset('dataset_1', data=str(table))
        file.close()


def load_table(self, filename):
    file = h5py.File(filename, "r")
    data = file.get('dataset_1')
    print(ast.literal_eval(data))

I’ve read online using the ast method literal_eval should work but it doesn’t appear to help… How do I ‘unpack’ the HDF5 so it’s a dictionary again?

Any ideas would be appreciated.

Asked By: mason7663

||

Answers:

If I understand what you are trying to do, this should work:

import numpy as np
import ast
import h5py


def store_table(filename):
    table = dict()
    table['test'] = list(np.zeros(7,dtype=int))

    with h5py.File(filename, "w") as file:
        file.create_dataset('dataset_1', data=str(table))


def load_table(filename):
    file = h5py.File(filename, "r")
    data = file.get('dataset_1')[...].tolist()
    file.close();
    return ast.literal_eval(data)

filename = "file.h5"
store_table(filename)
data = load_table(filename)
print(data)

Answered By: MROB

It’s not clear to me what you really want to accomplish. (I suspect your dictionaries have more than seven zeros. Otherwise, HDF5 is overkill to store your data.) If you have a lot of very large dictionaries, it would be better to covert the data to a NumPy array then either 1) create and load the dataset with data= or 2) create the dataset with an appropriate dtype then populate. You can create datasets with mixed datatypes, which is not addressed in the previous solution. If those situations don’t apply, you might want to save the dictionary as attributes. Attributes can be associated to a group, a dataset, or the file object itself. Which is best depends on your requirements.

I wrote a short example to show how to load dictionary key/value pairs as attribute names/value pairs tagged to a group. For this example, I assumed the dictionary has a name key with the group name for association. The process is almost identical for a dataset or file object (just change the object reference).

import h5py

def load_dict_to_attr(h5f, thisdict) :

   if 'name' not in thisdict:
       print('Dictionary missing name key. Skipping function.')
       return

   dname = thisdict.get('name') 
   if dname in h5f:
       print('Group:' + dname + ' exists. Skipping function.')
       return
   else: 
       grp = h5f.create_group(dname)

       for key, val in thisdict.items():
           grp.attrs[key] = val

###########################################

def get_grp_attrs(name, node) :

    grp_dict = {}
    for k in node.attrs.keys():
        grp_dict[k]= node.attrs[k]

    print (grp_dict)

###########################################

car1 = dict( name='my_car', brand='Ford', model='Mustang', year=1964,
             engine='V6',  disp=260,  units='cu.in' )
car2 = dict( name='your_car', brand='Chevy', model='Camaro', year=1969,
             engine='I6',  disp=250,  units='cu.in' )
car3 = dict( name='dads_car', brand='Mercedes', model='350SL', year=1972,
             engine='V8',  disp=4520, units='cc' )
car4 = dict( name='moms_car', brand='Plymouth', model='Voyager', year=1989,
             engine='V6',  disp=289,  units='cu.in' )

a_truck = dict(             brand='Dodge', model='RAM', year=1984,
               engine='V8', disp=359, units='cu.in' )

garage = dict(my_car=car1, 
              your_car=car2,
              dads_car=car3,
              moms_car=car4,
              a_truck=a_truck )

with h5py.File('SO_61226773.h5','w') as h5w:

    for car in garage:
        print ('nLoading dictionary:', car)
        load_dict_to_attr(h5w, garage.get(car))

with h5py.File('SO_61226773.h5','r') as h5r:

    print ('nReading dictionaries from Group attributes:')
    h5r.visititems (get_grp_attrs)
Answered By: kcw78

My preferred solution is just to convert them to ascii and then store this binary data.

import h5py
import json
import itertools
#generate a test dictionary
testDict={
    "one":1,
    "two":2,
    "three":3,
    "otherStuff":[{"A":"A"}]
}

testFile=h5py.File("test.h5","w")
#create a test data set containing the binary representation of my dictionary data
testFile.create_dataset(name="dictionary",shape=(len([i.encode("ascii","ignore") for i in json.dumps(testDict)]),1),dtype="S10",data=[i.encode("ascii","ignore") for i in json.dumps(testDict)])


testFile.close()

testFile=h5py.File("test.h5","r")
#load the test data back
dictionary=testFile["dictionary"][:].tolist()
dictionary=list(itertools.chain(*dictionary))
dictionary=json.loads(b''.join(dictionary))

The two key parts are:

testFile.create_dataset(name="dictionary",shape=(len([i.encode("ascii","ignore") for i in json.dumps(testDict)]),1),dtype="S10",data=[i.encode("ascii","ignore") for i in json.dumps(testDict)])

Where

data=[i.encode("ascii","ignore") for i in json.dumps(testDict)])

Converts the dictionary to a list of ascii charecters (The string shape may also be calculated from this)

Decoding back from the hdf5 container is a little simpler:

dictionary=testFile["dictionary"][:].tolist()
dictionary=list(itertools.chain(*dictionary))
dictionary=json.loads(b''.join(dictionary))

All that this is doing is loading the string from the hdf5 container and converting it to a list of bytes. Then I coerce this into a bytes object which I can convert back to a dictionary with json.loads

If you are ok with the extra library usage (json, ittertools) I think this offers a somewhat more pythonic solution (which in my case wasnt a problem since I was using them anyway).

Answered By: Eric Eckert
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.