Reshape a 3d array to a 2d array with leading points

Question:

I want to reshape this array (Python)

[[[0, 1, 2], [3, 4, 5], [6, 7, 8]],
 [[0, 1, 2], [3, 4, 5], [6, 7, 8]],
 [[0, 1, 2], [3, 4, 5], [6, 7, 8]]]

To this:

[
[0,0,0],
[1,1,1],
[2,2,2],
[3,3,3],
[4,4,4],
[5,5,5],
[6,6,6],
[7,7,7],
[8,8,8],
]

And then back

Couldn’t figure out how to do it with np.reshape

Its a series of height maps, and I want to interpolate each point with the corresponding one at the next map to create a smooth transition between them

Asked By: Simon

||

Answers:

import numpy as np
a = np.array(
    [[[0, 1, 2], [3, 4, 5], [6, 7, 8]],
     [[0, 1, 2], [3, 4, 5], [6, 7, 8]],
     [[0, 1, 2], [3, 4, 5], [6, 7, 8]]]
)
b = np.vstack(np.moveaxis(a, 0, 2))

Reverse operation:

a2 = np.moveaxis(np.vsplit(b, 3), 2, 0)

I think the easiest way to understand how this works is to look at the examples for vstack and then figuring out how do we need to modify array a so that vstack can produce the desired output.

In this case,

>>> np.moveaxis(a, 0, 2)
array([[[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2]],

       [[4, 4, 4],
        [5, 5, 5],
        [6, 6, 6]],

       [[7, 7, 7],
        [8, 8, 8],
        [9, 9, 9]]])

prepares the array a in such a way that now vstack can simply "stack" (glue? concatenate?) the 3 "sub-arrays" on top of each other, producing the desired 2D array.


EDIT: Second solution and Timings

This solution is an order of magnitude faster than any previous solution:

import numpy as np
a = np.array(
    [[[0, 1, 2], [3, 4, 5], [6, 7, 8]],
     [[0, 1, 2], [3, 4, 5], [6, 7, 8]],
     [[0, 1, 2], [3, 4, 5], [6, 7, 8]]]
)
b = a.reshape(3, -1).T

and for reverse:

a2 = b.T.reshape(3, 3, -1)

Some timings:

  • This solution:

      In [3]: %timeit a.reshape(3, -1).T
      277 ns ± 7.89 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
    
  • My previous solution (vstack and moveaxis):

      In [4]: %timeit np.vstack(np.moveaxis(a, 0, 2))
      9.56 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
    
  • @chrsig’s solution (reshape and moveaxis):

      In [5]: %timeit np.moveaxis(a, 0,-1).reshape(-1,3)
      3.83 µs ± 147 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
    
  • @chrsig’s 2nd solution (stride tricks):

      In [6]: %timeit np.lib.stride_tricks.as_strided(a, shape=(len(a)*len(a[0]), 3), strides=(a.strides[2], a.strides[0]))
      4.38 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
    
Answered By: AGN Gazer

With last correction, it seems that what you want is something like

np.moveaxis(a, 0,-1).reshape(-1,3)

Result

array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2],
       [3, 3, 3],
       [4, 4, 4],
       [5, 5, 5],
       [6, 6, 6],
       [7, 7, 7],
       [8, 8, 8]])

You probably know how to use reshape. It reinterprets the data as an array of as many lines as needed and 3 columns. The reason why reshape alone won’t do exactly what you want is because you would need the 0s to be consecutive in memory, then the 1s then the 2s, … Which they are not.
But that is solved by moveaxis: those 0s, 1s, 2s, … are consecutive when you iterate along axis 0 of your input array. So all you have to do is move axis 0 to the end, so that iterating the last axis does that (visiting 0s, then 1s, then 2s, …).

Note that moveaxis is very fast. Because it does not really build a new array. It is just a different view of the existing array. Some tricks with strides, so that visiting order appears changed.

Since you also asked for the other way, here it is (but it is just the same 2 operations, reversed and in reverse order. So undo the reshape, then undo the move axis)

res=np.moveaxis(a, 0,-1).reshape(-1,3) # Just to start from here
np.moveaxis(res.reshape(-1,3,3), -1, 0)

Result

array([[[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]],

       [[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]],

       [[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]]])

as expected

Answered By: chrslg

Another answer (it is rare that I post 2 answers to the same question. But this is really a different answer, and it is not obvious which one is the best, so I think both deserve their own independent post) is to rely on stride_tricks. It is a little bit what moveaxis already does. But not reshape.

A numpy array is just a bunch of data in memory. That are iterated with a given memory offset for each axis. Called stride.

For example

a=np.array([[[0, 1, 2], [3, 4, 5], [6, 7, 8]],
 [[0, 1, 2], [3, 4, 5], [6, 7, 8]],
 [[0, 1, 2], [3, 4, 5], [6, 7, 8]]])

is, internally, a memory array with integers 0,1,2,3,4,5,6,7,8,0,1,2,…
And to iterate them we use strides

a.strides
# (72, 24, 8)

Meaning that a[i+1,j,k] is 72 bytes after a[i,j,k], that a[i,j+1,k] is 24 bytes after a[i,j,k] and a[i,j,k+1] is 8 bytes after a[i,j,k].

Or, said otherwise, that a[i,j,k] is at address 72*i+24*j+8*k

Usually data are just contiguous, so strides is just 8 for the last axis (when data are 64 bits integers), 83 for the axis before, because there are 3 of those 8 bytes integer per elements of 2nd axis, and 83*3 for the first.

But you can have arrays with different strides. That is what happens with moveaxis

np.moveaxis(a, 0, 2).strides
# (24, 8, 72)

That is even all what moveaxis does: just change the strides so that np.moveaxis(a,0,2)[i,j,k] is at memory 24*i+8*j+72*k in other words, where a[k,i,j] is.

Numpy provides lower level function np.lib.stride_tricks.as_strided to manipulate those strides as we want (not just moving them as with moveaxis).

Equivalent of that np.moveaxis for example is

np.lib.stride_tricks.as_strided(a, strides=(24,8,72))

We can force that result to be a 2d array, like this

res = np.lib.stride_tricks.as_strided(a, shape=(len(a)*len(a[0]), 3), strides=(a.strides[2], a.strides[0]))

One limitation for that: it works only if a is a contiguous array (that is, if a is not already a result of some strides manipulation).

One advantage: it is not a new array. No new memory is used here. res is just the same as a, viewed differently.

In our case result is

>>> res
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2],
       [3, 3, 3],
       [4, 4, 4],
       [5, 5, 5],
       [6, 6, 6],
       [7, 7, 7],
       [8, 8, 8]])

But here, you don’t need to go back and forth. Those two visions of the data designate the same array.

So for example, if you change

res[0,1]=12
a[1,2,2]=15

Both operations impact both arrays

>>>a
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[12,  1,  2],
        [ 3,  4,  5],
        [ 6,  7, 15]],

       [[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]]])

As you see, a[1,2,2] is now 15 as it should. But also a[1,0,0] (that is what was the 2nd 0) is now 12.

Likewise

>>> b
array([[ 0, 12,  0],
       [ 1,  1,  1],
       [ 2,  2,  2],
       [ 3,  3,  3],
       [ 4,  4,  4],
       [ 5,  5,  5],
       [ 6,  6,  6],
       [ 7,  7,  7],
       [ 8, 15,  8]])

b[0,1] is now 12, as expected. But also b[8,1] is now 15.

So don’t know if this is useful for you. But I suspect it might, since you wanted to be able to go back and forth both format. With this, no need to. You can have them both at the same time, without conversion, without building arrays.

And of course, that is even faster than moveaxis/reshape

So, tl;dr:

If having two views of the same array is ok for you, and if a is a contiguous array (not something that you obtain by other strides manipulation), then

res = np.lib.stride_tricks.as_strided(a, shape=(len(a)*len(a[0]), 3), strides=(a.strides[2], a.strides[0]))

might be an even better solution for you than moveaxis/reshape

Answered By: chrslg