Python equivalent of MATLAB's dataset array
Question:
I’m trying to convert some code from MATLAB to Python. Is there a Python equivalent to MATLAB’s datset array?
http://www.mathworks.com/help/stats/dataset-arrays.html
Answers:
A Python dictionary can contain keys that are strings or numbers or even other dictionaries like so:
>>> d = {"name":"foo", "age":22, "props": {"value":2.1}}
>>> d['props']['value']
2.1
I’m assuming this is what you are looking to port over based on this quote from the site you linked to:
Statistics Toolbox™ has dataset arrays for storing variables with
heterogeneous data types. For example, you can combine numeric data,
logical data, cell arrays of strings, and categorical arrays in one
dataset array variable.
Take a look at Numpy, it’s a third party library mostly used for scientific computing with Python. There’s also a page covering Numpy for Matlab users.
I think that you are looking for Numpy.array.
You should look into pandas library, which is modeled after R’s data frame.
Not to mention this is way better than MATLAB’s dataset
If you want to perform numerical operations on the data set, numpy
would be the way to go.
You can specify arbitrary record types by combining basic numpy dtypes
, and access the records by their field names, similar to Python’s built-in dictionary access.
import numpy
myDtype = numpy.dtype([('name', numpy.str_), ('age', numpy.int32), ('score', numpy.float64)])
myData = numpy.empty(10, dtype=myDtype) # Create empty data sets
print myData['age'] # prints all ages
You can even save and re-load these data using the tofile
and ‘fromfile` functions in numpy and continue using the named fields:
with open('myfile.txt', 'wb') as f:
numpy.ndarray.tofile(myData, f)
with open('myfile.txt', 'rb') as f:
loadedData = numpy.fromfile(f, dtype=myDtype)
print loadedData['age']
I’m trying to convert some code from MATLAB to Python. Is there a Python equivalent to MATLAB’s datset array?
http://www.mathworks.com/help/stats/dataset-arrays.html
A Python dictionary can contain keys that are strings or numbers or even other dictionaries like so:
>>> d = {"name":"foo", "age":22, "props": {"value":2.1}}
>>> d['props']['value']
2.1
I’m assuming this is what you are looking to port over based on this quote from the site you linked to:
Statistics Toolbox™ has dataset arrays for storing variables with
heterogeneous data types. For example, you can combine numeric data,
logical data, cell arrays of strings, and categorical arrays in one
dataset array variable.
Take a look at Numpy, it’s a third party library mostly used for scientific computing with Python. There’s also a page covering Numpy for Matlab users.
I think that you are looking for Numpy.array.
You should look into pandas library, which is modeled after R’s data frame.
Not to mention this is way better than MATLAB’s dataset
If you want to perform numerical operations on the data set, numpy
would be the way to go.
You can specify arbitrary record types by combining basic numpy dtypes
, and access the records by their field names, similar to Python’s built-in dictionary access.
import numpy
myDtype = numpy.dtype([('name', numpy.str_), ('age', numpy.int32), ('score', numpy.float64)])
myData = numpy.empty(10, dtype=myDtype) # Create empty data sets
print myData['age'] # prints all ages
You can even save and re-load these data using the tofile
and ‘fromfile` functions in numpy and continue using the named fields:
with open('myfile.txt', 'wb') as f:
numpy.ndarray.tofile(myData, f)
with open('myfile.txt', 'rb') as f:
loadedData = numpy.fromfile(f, dtype=myDtype)
print loadedData['age']