Column stacking nested numpy structure array, help getting dims right

Question

I’m trying to create a nested record array, but I am having trouble with the dimensions. I tried following the example at how to set dtype for nested numpy ndarray?, but I am misunderstanding something. Below is an MRE. The arrays are generated in a script, not imported from CSV.

arr1 = np.array([4, 5, 4, 5])
arr2 = np.array([0, 0, -1, -1])
arr3 = np.array([0.51, 0.89, 0.59, 0.94])
arr4 = np.array(
    [[0.52, 0.80, 0.62, 1.1], [0.41, 0.71, 0.46, 0.77], [0.68, 1.12, 0.78, 1.19]]
).T
arr5 = np.repeat(np.array([0.6, 0.2, 0.2]), 4).reshape(3, 4).T
arrs = (arr1, arr2, arr3, arr4, arr5)

for i in arrs:
    print(i.shape, i)

For which the print statement returns:

(4,) [4 5 4 5]
(4,) [ 0  0 -1 -1]
(4,) [0.51 0.89 0.59 0.94]
(4, 3) [[0.52 0.41 0.68]
 [0.8  0.71 1.12]
 [0.62 0.46 0.78]
 [1.1  0.77 1.19]]
(4, 3) [[0.6 0.2 0.2]
 [0.6 0.2 0.2]
 [0.6 0.2 0.2]
 [0.6 0.2 0.2]]

However, the ans line throws an error:

dtypes = [
        ("state", "f8"),
        ("variability", "f8"),
        ("target", "f8"),
        ("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
        ("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
]
ans = np.column_stack(arrs).view(dtype=dtypes)

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

Problem 1: How do I get the desired array output?
print(np.column_stack(arrs)) returns

[[ 4.    0.    0.51  0.52  0.41  0.68  0.6   0.2   0.2 ]
 [ 5.    0.    0.89  0.8   0.71  1.12  0.6   0.2   0.2 ]
 [ 4.   -1.    0.59  0.62  0.46  0.78  0.6   0.2   0.2 ]
 [ 5.   -1.    0.94  1.1   0.77  1.19  0.6   0.2   0.2 ]]

But the desired output looks like this:

[[4 0 0.51 (0.52, 0.41, 0.68) (0.6, 0.2, 0.2)]
 [5 -1 0.89 (0.8, 0.71, 1.12) (0.6, 0.2, 0.2)]
 [4 0 0.59 (0.62, 0.46, 0.78) (0.6, 0.2, 0.2)]
 [5 -1 0.94 (1.1, 0.77, 1.19) (0.6, 0.2, 0.2)]]

Problem 2: How do I include the dtype.names?

print(rec_array.dtype.names) should return:
('state', 'variability', 'target', 'measured', 'var')

and print(rec_array['measured'].dtype.names) should return:
('mean', 'low', 'high')

and similarly for the names of the other nested array.

Asked By: a11

||

Source

Answer 1

With your dtype:

In [2]: dtypes = [
   ...:         ("state", "f8"),
   ...:         ("variability", "f8"),
   ...:         ("target", "f8"),
   ...:         ("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
   ...:         ("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
   ...: ]

A 2 element zeros array looks like:

In [3]: arr = np.zeros(2,dtypes)    
In [4]: arr
Out[4]: 
array([(0., 0., 0., [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)], [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)]),
       (0., 0., 0., [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)], [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)])],
      dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,)), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,))])

Using recfunctions I can map that to a unstructured array:

In [5]: import numpy.lib.recfunctions as rf    
In [6]: uarr = rf.structured_to_unstructured(arr)    
In [7]: uarr
Out[7]: 
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])    
In [8]: uarr.shape
Out[8]: (2, 27)

That says that your dtypes has 27 fields, not the 9 that seem to think (from your column stack).

Making a new (2,27) array, I can create a structured array:

In [9]: uarr = np.arange(2*27).reshape(2,27)
In [18]: rf.unstructured_to_structured(uarr, dtype=np.dtype(dtypes))
Out[18]: 
array([( 0.,  1.,  2., [( 3.,  4.,  5.), ( 6.,  7.,  8.), ( 9., 10., 11.), (12., 13., 14.)], [(15., 16., 17.), (18., 19., 20.), (21., 22., 23.), (24., 25., 26.)]),
       (27., 28., 29., [(30., 31., 32.), (33., 34., 35.), (36., 37., 38.), (39., 40., 41.)], [(42., 43., 44.), (45., 46., 47.), (48., 49., 50.), (51., 52., 53.)])],
      dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,)), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,))])

view still has problems with this. In some simple cases view does work, though it can require some dimensions adjustment. But I have not explored its limitations:

In [19]: uarr.view(np.dtype(dtypes))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [19], in <cell line: 1>()
----> 1 uarr.view(np.dtype(dtypes))

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

edit

removing the (4,) from dtypes:

In [35]: dtypes = [
    ...:         ("state", "f8"),
    ...:         ("variability", "f8"),
    ...:         ("target", "f8"),
    ...:         ("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")]),
    ...:         ("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")]),
    ...: ]

In [36]: arr = np.zeros(2,dtypes)

In [37]: arr
Out[37]: 
array([(0., 0., 0., (0., 0., 0.), (0., 0., 0.)),
       (0., 0., 0., (0., 0., 0.), (0., 0., 0.))],
      dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])

In [38]: uarr = np.arange(18).reshape(2,9)

In [39]: arr1 = rf.unstructured_to_structured(uarr, dtype=np.dtype(dtypes))

In [40]: arr1
Out[40]: 
array([(0.,  1.,  2., ( 3.,  4.,  5.), ( 6.,  7.,  8.)),
       (9., 10., 11., (12., 13., 14.), (15., 16., 17.))],
      dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])

In [43]: arr1['measured']
Out[43]: 
array([( 3.,  4.,  5.), (12., 13., 14.)],
      dtype=[('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')])

In [44]: arr1['measured']['mean']
Out[44]: array([ 3., 12.])

and via a csv and genfromtxt

In [45]: np.savetxt('foo', uarr)

In [46]: more foo
0.000000000000000000e+00 1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00 4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00 7.000000000000000000e+00 8.000000000000000000e+00
9.000000000000000000e+00 1.000000000000000000e+01 1.100000000000000000e+01 1.200000000000000000e+01 1.300000000000000000e+01 1.400000000000000000e+01 1.500000000000000000e+01 1.600000000000000000e+01 1.700000000000000000e+01

In [47]: data = np.genfromtxt('foo', dtype=dtypes)

In [48]: data
Out[48]: 
array([(0.,  1.,  2., ( 3.,  4.,  5.), ( 6.,  7.,  8.)),
       (9., 10., 11., (12., 13., 14.), (15., 16., 17.))],
      dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])

view still does not work.

with your data

In [50]: arr1 = np.array([4, 5, 4, 5])
    ...: arr2 = np.array([0, 0, -1, -1])
    ...: arr3 = np.array([0.51, 0.89, 0.59, 0.94])
    ...: arr4 = np.array(
    ...:     [[0.52, 0.80, 0.62, 1.1], [0.41, 0.71, 0.46, 0.77], [0.68, 1.12, 0.78, 1.19]]
    ...: ).T
    ...: arr5 = np.repeat(np.array([0.6, 0.2, 0.2]), 4).reshape(3, 4).T
    ...: arrs = (arr1, arr2, arr3, arr4, arr5)

In [51]: ans = np.column_stack(arrs)

In [52]: ans
Out[52]: 
array([[ 4.  ,  0.  ,  0.51,  0.52,  0.41,  0.68,  0.6 ,  0.2 ,  0.2 ],
       [ 5.  ,  0.  ,  0.89,  0.8 ,  0.71,  1.12,  0.6 ,  0.2 ,  0.2 ],
       [ 4.  , -1.  ,  0.59,  0.62,  0.46,  0.78,  0.6 ,  0.2 ,  0.2 ],
       [ 5.  , -1.  ,  0.94,  1.1 ,  0.77,  1.19,  0.6 ,  0.2 ,  0.2 ]])

In [53]: arr2 = rf.unstructured_to_structured(ans, dtype=np.dtype(dtypes))

In [54]: arr2
Out[54]: 
array([(4.,  0., 0.51, (0.52, 0.41, 0.68), (0.6, 0.2, 0.2)),
       (5.,  0., 0.89, (0.8 , 0.71, 1.12), (0.6, 0.2, 0.2)),
       (4., -1., 0.59, (0.62, 0.46, 0.78), (0.6, 0.2, 0.2)),
       (5., -1., 0.94, (1.1 , 0.77, 1.19), (0.6, 0.2, 0.2))],
      dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])

Answered By: hpaulj

Column stacking nested numpy structure array, help getting dims right

Question:

Answers:

edit

with your data