Averaging over every n elements of a numpy array
Question:
I have a numpy array. I want to create a new array which is the average over every consecutive triplet of elements. So the new array will be a third of the size as the original.
As an example:
np.array([1,2,3,1,2,3,1,2,3])
should return the array:
np.array([2,2,2])
Can anyone suggest an efficient way of doing this? I’m drawing blanks.
Answers:
If your array arr
has a length divisible by 3:
np.mean(arr.reshape(-1, 3), axis=1)
Reshaping to a higher dimensional array and then performing some form of reduce operation on one of the additional dimensions is a staple of numpy programming.
For googlers looking for a simple generalisation for arrays with multiple dimensions: the function block_reduce
in the scikit-image
module (link to docs).
It has a very simple interface to downsample arrays by applying a function such as numpy.mean
, but can also use others (maximum, median, …). The downsampling can be done by different factors for different axes by supplying a tuple with different sizes for the blocks. Here’s an example with a 2D array; downsampling only axis 1 by 5 using the mean:
import numpy as np
from skimage.measure import block_reduce
arr = np.stack((np.arange(1,20), np.arange(20,39)))
# array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]])
arr_reduced = block_reduce(arr, block_size=(1,5), func=np.mean, cval=np.mean(arr))
# array([[ 3. , 8. , 13. , 17.8],
# [22. , 27. , 32. , 33. ]])
As it was discussed in the comments to the other answer: if the array in the reduced dimension is not divisible by block size, padding values are provided by the argument cval
(0 by default).
To apply the accepted answer to 2D array for each column/feature:
arr.reshape(-1, downsample_ratio, arr.shape[1]).mean(axis = 1)
I have a numpy array. I want to create a new array which is the average over every consecutive triplet of elements. So the new array will be a third of the size as the original.
As an example:
np.array([1,2,3,1,2,3,1,2,3])
should return the array:
np.array([2,2,2])
Can anyone suggest an efficient way of doing this? I’m drawing blanks.
If your array arr
has a length divisible by 3:
np.mean(arr.reshape(-1, 3), axis=1)
Reshaping to a higher dimensional array and then performing some form of reduce operation on one of the additional dimensions is a staple of numpy programming.
For googlers looking for a simple generalisation for arrays with multiple dimensions: the function block_reduce
in the scikit-image
module (link to docs).
It has a very simple interface to downsample arrays by applying a function such as numpy.mean
, but can also use others (maximum, median, …). The downsampling can be done by different factors for different axes by supplying a tuple with different sizes for the blocks. Here’s an example with a 2D array; downsampling only axis 1 by 5 using the mean:
import numpy as np
from skimage.measure import block_reduce
arr = np.stack((np.arange(1,20), np.arange(20,39)))
# array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]])
arr_reduced = block_reduce(arr, block_size=(1,5), func=np.mean, cval=np.mean(arr))
# array([[ 3. , 8. , 13. , 17.8],
# [22. , 27. , 32. , 33. ]])
As it was discussed in the comments to the other answer: if the array in the reduced dimension is not divisible by block size, padding values are provided by the argument cval
(0 by default).
To apply the accepted answer to 2D array for each column/feature:
arr.reshape(-1, downsample_ratio, arr.shape[1]).mean(axis = 1)