Pandas to HDF5?

Question:

How do I convert a big table in Pandas/Numpy to h5 format with the same structure? I used the next code, but received .h5 version with messy data

data.to_hdf('data.h5',format = 'table', key='data')

I attached the image with my dataenter image description here
Or which data type can you recommend ?
I received the next structure
enter image description here

Asked By: Ron

||

Answers:

Setting format='table' writes the data as a PyTables Table. When you do this all the data will be in the ‘table’ dataset in group defined by key=. However, data of common data types will be grouped together in 1 ‘values_block_#’ column/field (all ints, all floats, etc). To write them separately, you also need data_columns=True. That defines the columns to be created as indexed data columns (set to True to use all columns).
Example below demonstrate the differences from each option. It creates 3 different files using data from your example. If you still don’t like the format with data_columns=True, you can use h5py or tables (PyTables) package to create the HDF5 schema and write the data as you like.

  1. file_1.h5 – uses default format (‘fixed’)
  2. file_2.h5 – uses ‘table’ format (only)
  3. file_3.h5 – uses ‘table’ format with data_columns=True

Code below:

id = [f'subj8_series8_{i}' for i in range(5) ] + 
     [f'subj8_series8_12409{i}' for i in range(5) ] 
Fp1 = [ 12, 157, 34, -98, 28,
       -160, -30, 64, 134, 159 ]
Fp2 = [ 60, 181, 111, 25, 120,
        192, 261, 322, 383, 407 ]

df = pd.DataFrame({'id': id, 'Fp1': Fp1, 'Fp2': Fp2})

df.to_hdf('file_1.h5', key='data') 
df.to_hdf('file_2.h5', key='data', format='table') 
df.to_hdf('file_3.h5', key='data', format='table', data_columns=True) 
Answered By: kcw78
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.