Read .mat files in Python
Question:
Is it possible to read binary MATLAB .mat files in Python?
I’ve seen that SciPy has alleged support for reading .mat files, but I’m unsuccessful with it. I installed SciPy version 0.7.0, and I can’t find the loadmat()
method.
Answers:
An import is required, import scipy.io
…
import scipy.io
mat = scipy.io.loadmat('file.mat')
Neither scipy.io.savemat
, nor scipy.io.loadmat
work for MATLAB arrays version 7.3. But the good part is that MATLAB version 7.3 files are hdf5 datasets. So they can be read using a number of tools, including NumPy.
For Python, you will need the h5py
extension, which requires HDF5 on your system.
import numpy as np
import h5py
f = h5py.File('somefile.mat','r')
data = f.get('data/variable1')
data = np.array(data) # For converting to a NumPy array
There is also the MATLAB Engine for Python by MathWorks itself. If you have MATLAB, this might be worth considering (I haven’t tried it myself but it has a lot more functionality than just reading MATLAB files). However, I don’t know if it is allowed to distribute it to other users (it is probably not a problem if those persons have MATLAB. Otherwise, maybe NumPy is the right way to go?).
Also, if you want to do all the basics yourself, MathWorks provides (if the link changes, try to google for matfile_format.pdf
or its title MATFILE Format
) a detailed documentation on the structure of the file format. It’s not as complicated as I personally thought, but obviously, this is not the easiest way to go. It also depends on how many features of the .mat
files you want to support.
I’ve written a “small” (about 700 lines) Python script which can read some basic .mat
files. I’m neither a Python expert nor a beginner and it took me about two days to write it (using the MathWorks documentation linked above). I’ve learned a lot of new stuff and it was quite fun (most of the time). As I’ve written the Python script at work, I’m afraid I cannot publish it… But I can give some advice here:
 First read the documentation.
 Use a hex editor (such as HxD) and look into a reference
.mat
file you want to parse.  Try to figure out the meaning of each byte by saving the bytes to a .txt file and annotate each line.
 Use classes to save each data element (such as
miCOMPRESSED
,miMATRIX
,mxDOUBLE
, ormiINT32
)  The
.mat
files’ structure is optimal for saving the data elements in a tree data structure; each node has one class and subnodes
Having MATLAB 2014b or newer installed, the MATLAB engine for Python could be used:
import matlab.engine
eng = matlab.engine.start_matlab()
content = eng.load("example.mat", nargout=1)
First save the .mat file as:
save('test.mat', 'v7')
After that, in Python, use the usual loadmat
function:
import scipy.io as sio
test = sio.loadmat('test.mat')
Reading the file
import scipy.io
mat = scipy.io.loadmat(file_name)
Inspecting the type of MAT variable
print(type(mat))
#OUTPUT  <class 'dict'>
The keys inside the dictionary are MATLAB variables, and the values are the objects assigned to those variables.
There is a nice package called mat4py
which can easily be installed using
pip install mat4py
It is straightforward to use (from the website):
Load data from a MATfile
The function loadmat
loads all variables stored in the MATfile into a simple Python data structure, using only Python’s dict
and list
objects. Numeric and cell arrays are converted to rowordered nested lists. Arrays are squeezed to eliminate arrays with only one element. The resulting data structure is composed of simple types that are compatible with the JSON format.
Example: Load a MATfile into a Python data structure:
from mat4py import loadmat
data = loadmat('datafile.mat')
The variable data
is a dict
with the variables and values contained in the MATfile.
Save a Python data structure to a MATfile
Python data can be saved to a MATfile, with the function savemat
. Data has to be structured in the same way as for loadmat
, i.e. it should be composed of simple data types, like dict
, list
, str
, int
, and float
.
Example: Save a Python data structure to a MATfile:
from mat4py import savemat
savemat('datafile.mat', data)
The parameter data
shall be a dict
with the variables.
from os.path import dirname, join as pjoin
import scipy.io as sio
data_dir = pjoin(dirname(sio.__file__), 'matlab', 'tests', 'data')
mat_fname = pjoin(data_dir, 'testdouble_7.4_GLNX86.mat')
mat_contents = sio.loadmat(mat_fname)
You can use above code to read the default saved .mat file in Python.
To read mat file to pandas dataFrame with mixed data types
import scipy.io as sio
mat=sio.loadmat('file.mat')# load matfile
mdata = mat['myVar'] # variable in mat file
ndata = {n: mdata[n][0,0] for n in mdata.dtype.names}
Columns = [n for n, v in ndata.items() if v.size == 1]
d=dict((c, ndata[c][0]) for c in Columns)
df=pd.DataFrame.from_dict(d)
display(df)
There is a great library for this task called: pymatreader
.
Just do as follows:

Install the package:
pip install pymatreader

Import the relevant function of this package:
from pymatreader import read_mat

Use the function to read the matlab struct:
data = read_mat('matlab_struct.mat')

use
data.keys()
to locate where the data is actually stored.
 The keys will usually look like:
dict_keys(['__header__', '__version__', '__globals__', 'data_opp'])
. Wheredata_opp
will be the actual key which stores the data. The name of this key can ofcourse be changed between different files.
 Last step – Create your dataframe:
my_df = pd.DataFrame(data['data_opp'])
That’s it 🙂
Can also use the hdf5storage library. official documentation here for details on matlab version support.
import hdf5storage
label_file = "./LabelTrain.mat"
out = hdf5storage.loadmat(label_file)
print(type(out)) # <class 'dict'>
Apart from scipy.io.loadmat
for v4 (Level 1.0), v6, v7 to 7.2 matfiles and h5py.File
for 7.3 format matfiles, there is anther type of matfiles in text data format instead of binary, usually created by Octave, which can’t even be read in MATLAB.
Both of scipy.io.loadmat
and h5py.File
can’t load them (tested on scipy 1.5.3 and h5py 3.1.0), and the only solution I found is numpy.loadtxt
.
import numpy as np
mat = np.loadtxt('xxx.mat')
scipy will work perfectly to load the .mat files.
And we can use the get() function to convert it to a numpy array.
mat = scipy.io.loadmat('point05m_matrix.mat')
x = mat.get("matrix")
print(type(x))
print(len(x))
plt.imshow(x, extent=[0,60,0,55], aspect='auto')
plt.show()
After struggling with this problem myself and trying other libraries (I have to say mat4py is a good one as well but with a few limitations) I have built this library ("matdata2py") that can handle most variable types and most importantly for me the "string" type. The .mat file needs to be saved in the V7.3 version. I hope this can be useful for the community.
Installation:
pip install matdata2py
How to use this lib:
import matdata2py as mtp
To load the Matlab data file:
Variables_output = mtp.loadmatfile(file_Name, StructsExportLikeMatlab = True, ExportVar2PyEnv = False)
print(Variables_output.keys()) # with ExportVar2PyEnv = False the variables are as elements of the Variables_output dictionary.
with ExportVar2PyEnv = True you can see each variable separately as python variables with the same name as saved in the Mat file.
Flag descriptions
StructsExportLikeMatlab = True/False structures are exported in dictionary format (False) or dotbased format similar to Matlab (True)
ExportVar2PyEnv = True/False export all variables in a single dictionary (True) or as separate individual variables into the python environment (False)