How to return a view of several columns in numpy structured array

Question:

I can see several columns (fields) at once in a numpy structured array by indexing with a list of the field names, for example

import numpy as np

a = np.array([(1.5, 2.5, (1.0,2.0)), (3.,4.,(4.,5.)), (1.,3.,(2.,6.))],
        dtype=[('x',float), ('y',float), ('value',float,(2,2))])

print a[['x','y']]
#[(1.5, 2.5) (3.0, 4.0) (1.0, 3.0)]

print a[['x','y']].dtype
#[('x', '<f4') ('y', '<f4')])

But the problem is that it seems to be a copy rather than a view:

b = a[['x','y']]
b[0] = (9.,9.)

print b
#[(9.0, 9.0) (3.0, 4.0) (1.0, 3.0)]

print a[['x','y']]
#[(1.5, 2.5) (3.0, 4.0) (1.0, 3.0)]

If I only select one column, it’s a view:

c = x['y']
c[0] = 99.

print c
#[ 99.  4.   3. ]

print a['y']
#[ 99.  4.   3. ]

Is there any way I can get the view behavior for more than one column at once?

I have two workarounds, one is to just loop through the columns, the other is to create a hierarchical dtype, so that the one column actually returns a structured array with the two (or more) fields that I want. Unfortunately, zip also returns a copy, so I can’t do:

x = a['x']; y = a['y']
z = zip(x,y)
z[0] = (9.,9.)
Asked By: askewchan

||

Answers:

I don’t think there is an easy way to achieve what you want. In general, you cannot take an arbitrary view into an array. Try the following:

>>> a
array([(1.5, 2.5, [[1.0, 2.0], [1.0, 2.0]]),
       (3.0, 4.0, [[4.0, 5.0], [4.0, 5.0]]),
       (1.0, 3.0, [[2.0, 6.0], [2.0, 6.0]])], 
      dtype=[('x', '<f8'), ('y', '<f8'), ('value', '<f8', (2, 2))])
>>> a.view(float)
array([ 1.5,  2.5,  1. ,  2. ,  1. ,  2. ,  3. ,  4. ,  4. ,  5. ,  4. ,
        5. ,  1. ,  3. ,  2. ,  6. ,  2. ,  6. ])

The float view of your record array shows you how the actual data is stored in memory. A view into this data has to be expressible as a combination of a shape, strides and offset into the above data. So if you wanted, for instance, a view of 'x' and 'y' only, you could do the following:

>>> from numpy.lib.stride_tricks import as_strided
>>> b = as_strided(a.view(float), shape=a.shape + (2,),
                   strides=a.strides + a.view(float).strides)
>>> b
array([[ 1.5,  2.5],
       [ 3. ,  4. ],
       [ 1. ,  3. ]])

The as_strided does the same as the perhaps easier to understand:

>>> bb = a.view(float).reshape(a.shape + (-1,))[:, :2]
>>> bb
array([[ 1.5,  2.5],
       [ 3. ,  4. ],
       [ 1. ,  3. ]])

Either of this is a view into a:

>>> b[0,0] =0
>>> a
array([(0.0, 2.5, [[0.0, 2.0], [1.0, 2.0]]),
       (3.0, 4.0, [[4.0, 5.0], [4.0, 5.0]]),
       (1.0, 3.0, [[2.0, 6.0], [2.0, 6.0]])], 
      dtype=[('x', '<f8'), ('y', '<f8'), ('value', '<f8', (2, 2))])
>>> bb[2, 1] = 0
>>> a
array([(0.0, 2.5, [[0.0, 2.0], [1.0, 2.0]]),
       (3.0, 4.0, [[4.0, 5.0], [4.0, 5.0]]),
       (1.0, 0.0, [[2.0, 6.0], [2.0, 6.0]])], 
      dtype=[('x', '<f8'), ('y', '<f8'), ('value', '<f8', (2, 2))])

It would be nice if either of this could be converted into a record array, but numpy refuses to do so, the reason not being all that clear to me:

>>> b.view([('x',float), ('y',float)])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: new type not compatible with array.

Of course what works (sort of) for 'x' and 'y' would not work, for instance, for 'x' and 'value', so in general the answer is: it cannot be done.

Answered By: Jaime

You can create a dtype object contains only the fields that you want, and use numpy.ndarray() to create a view of original array:

import numpy as np
strc = np.zeros(3, dtype=[('x', int), ('y', float), ('z', int), ('t', "i8")])

def fields_view(arr, fields):
    dtype2 = np.dtype({name:arr.dtype.fields[name] for name in fields})
    return np.ndarray(arr.shape, dtype2, arr, 0, arr.strides)

v1 = fields_view(strc, ["x", "z"])
v1[0] = 10, 100

v2 = fields_view(strc, ["y", "z"])
v2[1:] = [(3.14, 7)]

v3 = fields_view(strc, ["x", "t"])

v3[1:] = [(1000, 2**16)]

print(strc)

here is the output:

[(10, 0.0, 100, 0L) (1000, 3.14, 7, 65536L) (1000, 3.14, 7, 65536L)]
Answered By: HYRY

Building on @HYRY’s answer, you could also use ndarray‘s method getfield:

def fields_view(array, fields):
    return array.getfield(numpy.dtype(
        {name: array.dtype.fields[name] for name in fields}
    ))
Answered By: ChristopherC

As of Numpy version 1.16, the code you propose will return a view. See ‘NumPy 1.16.0 Release Notes->Future Changes->multi-field views return a view instead of a copy’ on this page:

https://numpy.org/doc/stable/release/1.16.0-notes.html#multi-field-views-return-a-view-instead-of-a-copy

Answered By: Anders Lindstrom

In my case ‘several columns’ happens to be equal to two columns of the same data type, where I can use the following function to make a view:

def make_view(arr, fields, dtype):
    offsets = [arr.dtype.fields[f][1] for f in fields]
    offset = min(offsets)
    stride = max(offsets)
    return np.ndarray((len(arr), 2), buffer=arr, offset=offset, strides=(arr.strides[0], stride-offset), dtype=dtype)

I think this boils down the the same thing @Jamie said, it cannot be done in general, but for two columns of the same dtype it can. The result of this function is not a dict but a good old fashioned numpy array.

Answered By: xyzzyqed
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.