Reading in numpy array from buffer with different data types without copying array

Question:

I have data encoded as a binary string, with a mix of data types.

As an example (real data is much larger),

data = b'l:x00x00xc0xff|:gx8emxbf}:xceUqxbf'

I am reading this into an numpy array:

buffer = np.frombuffer(np.array(data), dtype='B')

which gives

array([108,  58,   0,   0, 192, 255, 124,  58, 103, 142, 109, 191, 125,
    58, 206,  85, 113, 191], dtype=uint8)

I need to change this to (np.uint16, np.float), so the above array is

[(14956,NaN),(14972,-0.9280),(14973,-0.9427)]

I can use view for a single data type, e.g.
buffer.view(dtype=np.uint16) gives

array([14956,     0, 65472, 14972, 36455, 49005, 14973, 21966, 49009], dtype=uint16)

However don’t think I can use a combination of data types for view like this. I have tried reshaping and slicing this,

buffer = buffer.reshape((3,-1))
firstData = buffer[:,:2]
firstData = array([[108,  58],
                   [124,  58],
                   [125,  58]], dtype=uint8)
firstData.view(dtype = np.uint16)
ValueError: new type not compatible with array.

As hinted in the documentation, this can be resolved by copying

firstData = firstData.copy()
firstData.view(dtype=np.uint16)
array([[14956],
       [14972],
       [14973]], dtype=uint16)

Is there a fast way to do this without copying the array?

Asked By: user157545

||

Answers:

Use a structured data type with two fields:

In [89]: data = b'l:x00x00xc0xff|:gx8emxbf}:xceUqxbf'

In [90]: dt = np.dtype([('a', np.uint16), ('b', np.float32)])

In [91]: x = np.frombuffer(data, dtype=dt)

In [92]: x
Out[92]: 
array([(14956,         nan), (14972, -0.92795414), (14973, -0.94271553)], 
      dtype=[('a', '<u2'), ('b', '<f4')])

x is a one-dimensional structured array; each item in x is a structure with fields a and b:

In [93]: x[0]
Out[93]: (14956,  nan)

In [94]: x['a']
Out[94]: array([14956, 14972, 14973], dtype=uint16)

Note that I used np.float32 for the floating point field. np.float was an alias for the Python builtin float. It was deprecated for a long time, and has been removed from recent versions of numpy.

Answered By: Warren Weckesser
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.