hdf5

Convert large csv to hdf5

Convert large csv to hdf5 Question: I have a 100M line csv file (actually many separate csv files) totaling 84GB. I need to convert it to a HDF5 file with a single float dataset. I used h5py in testing without any problems, but now I can’t do the final dataset without running out of memory. …

Total answers: 3

HDF5 file created with h5py can't be opened by h5py

HDF5 file created with h5py can't be opened by h5py Question: I created an HDF5 file apparently without any problems, under Ubuntu 12.04 (32bit version), using Anaconda as Python distribution and writing in ipython notebooks. The underlying data are all numpy arrays. For example, import numpy as np import h5py f = h5py.File(‘myfile.hdf5′,’w’) group = …

Total answers: 3

Incremental writes to hdf5 with h5py

Incremental writes to hdf5 with h5py Question: I have got a question about how best to write to hdf5 files with python / h5py. I have data like: —————————————– | timepoint | voltage1 | voltage2 | … —————————————– | 178 | 10 | 12 | … —————————————– | 179 | 12 | 11 | … …

Total answers: 2

Storing a list of strings to a HDF5 Dataset from Python

Storing a list of strings to a HDF5 Dataset from Python Question: I am trying to store a variable length list of string to a HDF5 Dataset. The code for this is import h5py h5File=h5py.File(‘xxx.h5′,’w’) strList=[‘asas’,’asas’,’asas’] h5File.create_dataset(‘xxx’,(len(strList),1),’S10′,strList) h5File.flush() h5File.Close() I am getting an error stating that “TypeError: No conversion path for dtype: dtype(‘&lt U3’)” where …

Total answers: 3

Improve pandas (PyTables?) HDF5 table write performance

Improve pandas (PyTables?) HDF5 table write performance Question: I’ve been using pandas for research now for about two months to great effect. With large numbers of medium-sized trace event datasets, pandas + PyTables (the HDF5 interface) does a tremendous job of allowing me to process heterogenous data using all the Python tools I know and …

Total answers: 2

Combining hdf5 files

Combining hdf5 files Question: I have a number of hdf5 files, each of which have a single dataset. The datasets are too large to hold in RAM. I would like to combine these files into a single file containing all datasets separately (i.e. not to concatenate the datasets into a single dataset). One way to …

Total answers: 6

HDF5 taking more space than CSV?

HDF5 taking more space than CSV? Question: Consider the following example: Prepare the data: import string import random import pandas as pd matrix = np.random.random((100, 3000)) my_cols = [random.choice(string.ascii_uppercase) for x in range(matrix.shape[1])] mydf = pd.DataFrame(matrix, columns=my_cols) mydf[‘something’] = ‘hello_world’ Set the highest compression possible for HDF5: store = pd.HDFStore(‘myfile.h5′,complevel=9, complib=’bzip2’) store[‘mydf’] = mydf store.close() …

Total answers: 1

HDF5 – concurrency, compression & I/O performance

HDF5 – concurrency, compression & I/O performance Question: I have the following questions about HDF5 performance and concurrency: Does HDF5 support concurrent write access? Concurrency considerations aside, how is HDF5 performance in terms of I/O performance (does compression rates affect the performance)? Since I use HDF5 with Python, how does its performance compare to Sqlite? …

Total answers: 2

"Large data" workflows using pandas

"Large data" workflows using pandas Question: I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it’s out-of-core support. However, SAS is horrible as a piece of software for numerous other reasons. One day I hope …

Total answers: 16