Reading in numpy array from buffer with different data types without copying array
Question:
I have data encoded as a binary string, with a mix of data types.
As an example (real data is much larger),
data = b'l:x00x00xc0xff|:gx8emxbf}:xceUqxbf'
I am reading this into an numpy array:
buffer = np.frombuffer(np.array(data), dtype='B')
which gives
array([108, 58, 0, 0, 192, 255, 124, 58, 103, 142, 109, 191, 125,
58, 206, 85, 113, 191], dtype=uint8)
I need to change this to (np.uint16, np.float)
, so the above array is
[(14956,NaN),(14972,-0.9280),(14973,-0.9427)]
I can use view for a single data type, e.g.
buffer.view(dtype=np.uint16)
gives
array([14956, 0, 65472, 14972, 36455, 49005, 14973, 21966, 49009], dtype=uint16)
However don’t think I can use a combination of data types for view like this. I have tried reshaping and slicing this,
buffer = buffer.reshape((3,-1))
firstData = buffer[:,:2]
firstData = array([[108, 58],
[124, 58],
[125, 58]], dtype=uint8)
firstData.view(dtype = np.uint16)
ValueError: new type not compatible with array.
As hinted in the documentation, this can be resolved by copying
firstData = firstData.copy()
firstData.view(dtype=np.uint16)
array([[14956],
[14972],
[14973]], dtype=uint16)
Is there a fast way to do this without copying the array?
Answers:
Use a structured data type with two fields:
In [89]: data = b'l:x00x00xc0xff|:gx8emxbf}:xceUqxbf'
In [90]: dt = np.dtype([('a', np.uint16), ('b', np.float32)])
In [91]: x = np.frombuffer(data, dtype=dt)
In [92]: x
Out[92]:
array([(14956, nan), (14972, -0.92795414), (14973, -0.94271553)],
dtype=[('a', '<u2'), ('b', '<f4')])
x
is a one-dimensional structured array; each item in x
is a structure with fields a
and b
:
In [93]: x[0]
Out[93]: (14956, nan)
In [94]: x['a']
Out[94]: array([14956, 14972, 14973], dtype=uint16)
Note that I used np.float32
for the floating point field. np.float
was an alias for the Python builtin float
. It was deprecated for a long time, and has been removed from recent versions of numpy.
I have data encoded as a binary string, with a mix of data types.
As an example (real data is much larger),
data = b'l:x00x00xc0xff|:gx8emxbf}:xceUqxbf'
I am reading this into an numpy array:
buffer = np.frombuffer(np.array(data), dtype='B')
which gives
array([108, 58, 0, 0, 192, 255, 124, 58, 103, 142, 109, 191, 125,
58, 206, 85, 113, 191], dtype=uint8)
I need to change this to (np.uint16, np.float)
, so the above array is
[(14956,NaN),(14972,-0.9280),(14973,-0.9427)]
I can use view for a single data type, e.g.
buffer.view(dtype=np.uint16)
gives
array([14956, 0, 65472, 14972, 36455, 49005, 14973, 21966, 49009], dtype=uint16)
However don’t think I can use a combination of data types for view like this. I have tried reshaping and slicing this,
buffer = buffer.reshape((3,-1))
firstData = buffer[:,:2]
firstData = array([[108, 58],
[124, 58],
[125, 58]], dtype=uint8)
firstData.view(dtype = np.uint16)
ValueError: new type not compatible with array.
As hinted in the documentation, this can be resolved by copying
firstData = firstData.copy()
firstData.view(dtype=np.uint16)
array([[14956],
[14972],
[14973]], dtype=uint16)
Is there a fast way to do this without copying the array?
Use a structured data type with two fields:
In [89]: data = b'l:x00x00xc0xff|:gx8emxbf}:xceUqxbf'
In [90]: dt = np.dtype([('a', np.uint16), ('b', np.float32)])
In [91]: x = np.frombuffer(data, dtype=dt)
In [92]: x
Out[92]:
array([(14956, nan), (14972, -0.92795414), (14973, -0.94271553)],
dtype=[('a', '<u2'), ('b', '<f4')])
x
is a one-dimensional structured array; each item in x
is a structure with fields a
and b
:
In [93]: x[0]
Out[93]: (14956, nan)
In [94]: x['a']
Out[94]: array([14956, 14972, 14973], dtype=uint16)
Note that I used np.float32
for the floating point field. np.float
was an alias for the Python builtin float
. It was deprecated for a long time, and has been removed from recent versions of numpy.