How to get numpy to interpret object array slice as single array?

Question:

When you ask Numpy to make an array out of a collection including arbitrary objects, it will create an array of "object" type, which allows you to use index slicing across those objects, but since the object itself is unknown to numpy, you cannot index into the object in one go (even if that particular object is actually a numpy array).

However, if you slice into the object array to select the parts of the object array that are actually numpy arrays, it seems that numpy won’t collapse that slice into a single numpy array, even with another call to np.array(). Here is a little example of what I mean:

>>> aa = np.array([np.random.randn(3, 4), {'something': 'blah'}], dtype=object)
>>> aa.shape
(2,)
>>> np.array(aa[0:1])
array([array([[ 1.78237043, -0.61082005,  0.92160137,  0.58961677],
              [ 1.54183639, -0.43097464,  1.36213935, -1.2695875 ],
              [ 0.01431181, -0.62073519,  0.56267489, -0.46113538]])],
      dtype=object)
>>> np.array(aa[0:1]).shape # I want this to be (1, 3, 4)
(1,)

Is there any way to do this without a double copy (e.g. not like this: np.array(aa[0:1].tolist()))? Does an object array even allow you to do this without such a copy?

Asked By: Multihunter

||

Answers:

You can use np.stack to combine the object-type array to a normal ndarray:

>>> aa = np.array([np.random.randn(3, 4), {'something': 'blah'}], dtype=object)
>>> aa
array([array([[-6.36267204e-01,  8.95707498e-02,  1.09275216e+00,
               -3.70594544e-01],
              [ 8.32865823e-01, -6.53876690e-01,  1.21000457e+00,
                1.22046398e+00],
              [-5.30262118e-01,  1.17934947e-04,  4.45156002e-01,
               -6.61549444e-02]])                                ,
       {'something': 'blah'}], dtype=object)
>>> np.stack(aa[0:1])
array([[[-6.36267204e-01,  8.95707498e-02,  1.09275216e+00,
         -3.70594544e-01],
        [ 8.32865823e-01, -6.53876690e-01,  1.21000457e+00,
          1.22046398e+00],
        [-5.30262118e-01,  1.17934947e-04,  4.45156002e-01,
         -6.61549444e-02]]])
>>> np.stack(aa[0:1]).shape
(1, 3, 4)

This also works with multiple ndarrays in your object-array, as long as they have compatible sizes.

Internally, this just treats the object-array as a sequence and iterates over it. I’m not sure if it has a significant performance benefit over your solution with np.array(aa[0:1].tolist()).

Answered By: johannesack
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.