How to read HDF5 files in Python

Question

I am trying to read data from hdf5 file in Python. I can read the hdf5 file using h5py, but I cannot figure out how to access data within the file.

My code

import h5py    
import numpy as np    
f1 = h5py.File(file_name,'r+')

This works and the file is read. But how can I access data inside the file object f1?

Asked By: Sameer Damir

||

Source

Answer 1

you can use Pandas.

import pandas as pd
pd.read_hdf(filename,key)

Answered By: Danny

Answer 2

What you need to do is create a dataset. If you take a look at the quickstart guide, it shows you that you need to use the file object in order to create a dataset. So, f.create_dataset and then you can read the data. This is explained in the docs.

Answered By: Games Brainiac

Answer 3

Read HDF5

import h5py
filename = "file.hdf5"

with h5py.File(filename, "r") as f:
    # Print all root level object names (aka keys) 
    # these can be group or dataset names 
    print("Keys: %s" % f.keys())
    # get first object name/key; may or may NOT be a group
    a_group_key = list(f.keys())[0]

    # get the object type for a_group_key: usually group or dataset
    print(type(f[a_group_key])) 

    # If a_group_key is a group name, 
    # this gets the object names in the group and returns as a list
    data = list(f[a_group_key])

    # If a_group_key is a dataset name, 
    # this gets the dataset values and returns as a list
    data = list(f[a_group_key])
    # preferred methods to get dataset values:
    ds_obj = f[a_group_key]      # returns as a h5py dataset object
    ds_arr = f[a_group_key][()]  # returns as a numpy array

Write HDF5

import h5py

# Create random data
import numpy as np
data_matrix = np.random.uniform(-1, 1, size=(10, 3))

# Write data to HDF5
with h5py.File("file.hdf5", "w") as data_file:
    data_file.create_dataset("dataset_name", data=data_matrix)

See h5py docs for more information.

Alternatives

JSON: Nice for writing human-readable data; VERY commonly used (read & write)
CSV: Super simple format (read & write)
pickle: A Python serialization format (read & write)
MessagePack (Python package): More compact representation (read & write)
HDF5 (Python package): Nice for matrices (read & write)
XML: exists too *sigh* (read & write)

For your application, the following might be important:

Support by other programming languages
Reading / writing performance
Compactness (file size)

See also: Comparison of data serialization formats

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python

Answered By: Martin Thoma

Answer 4

Reading the file

import h5py

f = h5py.File(file_name, mode)

Studying the structure of the file by printing what HDF5 groups are present

for key in f.keys():
    print(key) #Names of the root level object names in HDF5 file - can be groups or datasets.
    print(type(f[key])) # get the object type: usually group or dataset

Extracting the data

#Get the HDF5 group; key needs to be a group name from above
group = f[key]

#Checkout what keys are inside that group.
for key in group.keys():
    print(key)

# This assumes group[some_key_inside_the_group] is a dataset, 
# and returns a np.array:
data = group[some_key_inside_the_group][()]
#Do whatever you want with data

#After you are done
f.close()

Answered By: Daksh

Answer 5

Use below code to data read and convert into numpy array

import h5py
f1 = h5py.File('data_1.h5', 'r')
list(f1.keys())
X1 = f1['x']
y1=f1['y']
df1= np.array(X1.value)
dfy1= np.array(y1.value)
print (df1.shape)
print (dfy1.shape)

Preferred method to read dataset values into a numpy array:

import h5py
# use Python file context manager:
with h5py.File('data_1.h5', 'r') as f1:
    print(list(f1.keys()))  # print list of root level objects
    # following assumes 'x' and 'y' are dataset objects
    ds_x1 = f1['x']  # returns h5py dataset object for 'x'
    ds_y1 = f1['y']  # returns h5py dataset object for 'y'
    arr_x1 = f1['x'][()]  # returns np.array for 'x'
    arr_y1 = f1['y'][()]  # returns np.array for 'y'
    arr_x1 = ds_x1[()]  # uses dataset object to get np.array for 'x'
    arr_y1 = ds_y1[()]  # uses dataset object to get np.array for 'y'
    print (arr_x1.shape)
    print (arr_y1.shape)

Answered By: ashish bansal

Answer 6

To read the content of .hdf5 file as an array, you can do something as follow

> import numpy as np 
> myarray = np.fromfile('file.hdf5', dtype=float)
> print(myarray)

Answered By: Raza

Answer 7

Here’s a simple function I just wrote which reads a .hdf5 file generated by the save_weights function in keras and returns a dict with layer names and weights:

def read_hdf5(path):

    weights = {}

    keys = []
    with h5py.File(path, 'r') as f: # open file
        f.visit(keys.append) # append all keys to list
        for key in keys:
            if ':' in key: # contains data if ':' in key
                print(f[key].name)
                weights[f[key].name] = f[key].value
    return weights

https://gist.github.com/Attila94/fb917e03b04035f3737cc8860d9e9f9b.

Haven’t tested it thoroughly but does the job for me.

Answered By: Attila

Answer 8

from keras.models import load_model 

h= load_model('FILE_NAME.h5')

Answered By: Judice

Answer 9

Using bits of answers from this question and the latest doc, I was able to extract my numerical arrays using

import h5py
with h5py.File(filename, 'r') as h5f:
    h5x = h5f[list(h5f.keys())[0]]['x'][()]

Where 'x' is simply the X coordinate in my case.

Answered By: Patol75

Answer 10

If you have named datasets in the hdf file then you can use the following code to read and convert these datasets in numpy arrays:

import h5py
file = h5py.File('filename.h5', 'r')

xdata = file.get('xdata')
xdata= np.array(xdata)

If your file is in a different directory you can add the path in front of'filename.h5'.

Answered By: Machzx

Answer 11

use this it works fine for me


    weights = {}

    keys = []
    with h5py.File("path.h5", 'r') as f: 
        f.visit(keys.append) 
        for key in keys:
            if ':' in key: 
                print(f[key].name)     
                weights[f[key].name] = f[key][()]
    return weights

print(read_hdf5())

if you are using the h5py<=’2.9.0′
then you can use


    weights = {}

    keys = []
    with h5py.File("path.h5", 'r') as f: 
        f.visit(keys.append) 
        for key in keys:
            if ':' in key: 
                print(f[key].name)     
                weights[f[key].name] = f[key].value
    return weights

print(read_hdf5())

Answered By: Zaeem Asghar

How to read HDF5 files in Python

Question:

My code

Answers:

Read HDF5

Write HDF5

Alternatives