Sum elements before and replace element with sum

Question:

I have the numpy array

arr = np.array([[0, 0, 2, 5, 0, 0, 1, 8, 0, 3, 0],
                [1, 2, 0, 0, 0, 0, 5, 7, 0, 0, 0],
                [8, 5, 3, 9, 0, 1, 0, 0, 0, 0, 1]])

I need the result array like this:

[[0, 0, 0, 0, 7, 0, 0, 0, 9, 0, 3]
 [0, 0, 3, 0, 0, 0, 0, 0, 12, 0, 0]
 [0, 0, 0, 0, 25, 0, 1, 0, 0, 0, 0]]

What’s happened?

We go along the row, if element in row is 0, then we go to the next element , if not 0, then we sum up the elements until 0 is met, once 0 is met, then we replace it with the resulting sum (also replace the initial non-zero numbers with 0

I already know how to do that with loops but it doesn’t work well on time for a large number of rows, so I need time-efficient solution in numpy methods

Asked By: qtsar

||

Answers:

We can solve using 2 for loops. In every row we will define current_sum and if number is zero we assign current_sum to number and reset current_sum; if number is not zero we assign 0 to number and we increment current_sum.

Edit: Sorry first i didn’t realize you want an efficient solution. We can use numba to accelerate for loops. it is really simple and powerful.
Here is the code:

import numpy as np
import numba
arr = np.array([[0, 0, 2, 5, 0, 0, 1, 8, 0, 3, 0],
                [1, 2, 0, 0, 0, 0, 5, 7, 0, 0, 0],
                [8, 5, 3, 9, 0, 1, 0, 0, 0, 0, 1]])

@numba.jit(nopython=True)
def mySum(array):
    for i in range(array.shape[0]):
        current_sum = 0
        for j in range(array.shape[1]):
            if array[i,j] == 0:
                array[i,j] = current_sum
                current_sum = 0
            else:
                current_sum += array[i,j]
                array[i,j] = 0
    return array

print(mySum(arr))

function is slow in first run because it understands input and function and creates machine code, but after that it is really fast. I hope it is fast enough for your case.

Answered By: Polatkan Polat

First, we want to find the locations where the array has a zero next to a non-zero.

rr, cc = np.where((arr[:, 1:] == 0) & (arr[:, :-1] != 0))

Now, we can use np.add.reduceat to add elements. Unfortunately, reduceat needs a list of 1-d indices, so we’re going to have to play with shapes a little. Calculating the equivalent indices of rr, cc in a flattened array is easy:

reduce_indices = rr * arr.shape[1] + cc + 1
# array([ 4,  8, 10, 13, 19, 26, 28])

We want to reduce from the start of every row, so we’ll create a row_starts to mix in with the indices calculated above:

row_starts = np.arange(arr.shape[0]) * arr.shape[1]
# array([ 0, 11, 22])

reduce_indices = np.hstack((row_starts, reduce_indices))
reduce_indices.sort()
# array([ 0,  4,  8, 10, 11, 13, 19, 22, 26, 28])

Now, call np.add.reduceat on the flattened input array, reducing at reduce_indices

totals = np.add.reduceat(arr.flatten(), reduce_indices)
# array([ 7,  9,  3,  0,  3, 12,  0, 25,  1,  1])

Now we have the totals, we need to assign them to an array of zeros. Note that the 0th element of totals needs to go to the 1th index of reduce_indices, and the last element of totals is to be discarded:

result_f = np.zeros((arr.size,))
result_f[reduce_indices[1:]] = totals[:-1]
result = result_f.reshape(arr.shape)

Now, one last step remains. For cases where the last element in a row is nonzero, reduceat would calculate a nonzero value for the first element of the next row, as you mentioned in the comment below. An easy solution is to overwrite these to zero.

result[:, 0] = 0

which gives the expected result:

array([[ 0.,  0.,  0.,  0.,  7.,  0.,  0.,  0.,  9.,  0.,  3.],
       [ 0.,  0.,  3.,  0.,  0.,  0.,  0.,  0., 12.,  0.,  0.],
       [ 0.,  0.,  0.,  0., 25.,  0.,  1.,  0.,  0.,  0.,  0.]])
Answered By: Pranav Hosangadi

Maybe longer than in loop… But let me demonstrate with single array:

a = np.array([0, 0, 2, 5, 0, 0, 1, 8, 0, 3, 0])
zero_index = np.where(a == 0)[0]
# Split zeros, sum each slice, drop the last one
replace_arr = np.array(list(map(sum, np.split(a, zero_index))))[:-1]
output = np.zeros(11)
# Put sum data into zeros array
np.put_along_axis(output, zero_index, replace_arr, axis=0)
output
Answered By: Hanwei Tang
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.