# How to flatten only some dimensions of a numpy array

## Question:

Is there a quick way to “sub-flatten” or flatten only some of the first dimensions in a numpy array?

For example, given a numpy array of dimensions `(50,100,25)`, the resultant dimensions would be `(5000,25)`

Take a look at numpy.reshape .

``````>>> arr = numpy.zeros((50,100,25))
>>> arr.shape
# (50, 100, 25)

>>> new_arr = arr.reshape(5000,25)
>>> new_arr.shape
# (5000, 25)

# One shape dimension can be -1.
# In this case, the value is inferred from
# the length of the array and remaining dimensions.
>>> another_arr = arr.reshape(-1, arr.shape[-1])
>>> another_arr.shape
# (5000, 25)
``````

A slight generalization to Alexander’s answer – np.reshape can take -1 as an argument, meaning “total array size divided by product of all other listed dimensions”:

e.g. to flatten all but the last dimension:

``````>>> arr = numpy.zeros((50,100,25))
>>> new_arr = arr.reshape(-1, arr.shape[-1])
>>> new_arr.shape
# (5000, 25)
``````

A slight generalization to Peter’s answer — you can specify a range over the original array’s shape if you want to go beyond three dimensional arrays.

e.g. to flatten all but the last two dimensions:

``````arr = numpy.zeros((3, 4, 5, 6))
new_arr = arr.reshape(-1, *arr.shape[-2:])
new_arr.shape
# (12, 5, 6)
``````

EDIT: A slight generalization to my earlier answer — you can, of course, also specify a range at the beginning of the of the reshape too:

``````arr = numpy.zeros((3, 4, 5, 6, 7, 8))
new_arr = arr.reshape(*arr.shape[:2], -1, *arr.shape[-2:])
new_arr.shape
# (3, 4, 30, 7, 8)
``````

An alternative approach is to use `numpy.resize()` as in:

``````In : shp = (50,100,25)
In : arr = np.random.random_sample(shp)
In : resized_arr = np.resize(arr, (np.prod(shp[:2]), shp[-1]))
In : resized_arr.shape
Out: (5000, 25)

# sanity check with other solutions
In : resized = np.reshape(arr, (-1, shp[-1]))
In : np.allclose(resized_arr, resized)
Out: True
``````

`numpy.vstack` is perfect for this situation

``````import numpy as np
arr = np.ones((50,100,25))
np.vstack(arr).shape
> (5000, 25)
``````

I prefer to use `stack`, `vstack` or `hstack` over `reshape` because `reshape` just scans through the data and seems to brute-force it into the desired shape. This can be problematic if you are e.g. going to take column averages.

Here’s an illustration of what I mean. Suppose we have the following array

``````>>> arr.shape
(2, 3, 4)
>>> arr
array([[[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]],

[[7, 7, 7, 7],
[7, 7, 7, 7],
[7, 7, 7, 7]]])

``````

We apply both methods to get an array of shape (3,8)

``````>>> arr.reshape((3,8)).shape
(3, 8)
>>> np.hstack(arr).shape
(3, 8)
``````

However if we look at how they have been reshaped in each case, the `hstack` would allow us to take column sums that we could also have calculated from the original array. With reshape this isn’t possible.

``````>>> arr.reshape((3,8))
array([[1, 2, 3, 4, 1, 2, 3, 4],
[1, 2, 3, 4, 7, 7, 7, 7],
[7, 7, 7, 7, 7, 7, 7, 7]])
>>> np.hstack(arr)
array([[1, 2, 3, 4, 7, 7, 7, 7],
[1, 2, 3, 4, 7, 7, 7, 7],
[1, 2, 3, 4, 7, 7, 7, 7]])
``````
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.