How can I sum several variables in numpy array?
Question:
I have a numpy array, it looks like this:
test = numpy.array([0, 0, 1, 3, 5, 0, 0, 0, 15, 16, 2, 0, 0])
I would like to get the sum of each "number-cluster":
It should be like this:
[0 0 0 9 0 0 0 0 0 33 0 0 0 ]
I am searching for pandas or numpy modul to do this. Are there any suggestions or options?
Edit:
To clearify my question:
I have this part of a real dataset (see below):
I am trying to sum all values between zeros and put the sum value in the middle of the number sequence, and values around the sum value should be turn in zero. I am doing this to get a stem plot, where all sticks are in the middle of each curve.
Full reproducible example:
numpy.array([0.14615127632512193, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.029740091488616338, 0.09063089178836162, 0.1380136511666047, 0.17187288438267243, 0.19248433518089703, 0.2003245168693058, 0.19614351292272647, 0.18088710080137402, 0.15564787198250443, 0.1216984367737226, 0.08039857005072917, 0.033215997285686215, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.025134855935682095, 0.10513055014366987, 0.18598353864085884, 0.26609407961465364, 0.34387046651092235, 0.4177887209889943, 0.4863527667154762, 0.5482299378679611, 0.6021788910022772, 0.6472089430838289, 0.6825618586582046, 0.7076575672260416, 0.7222015685948489, 0.726197524625621, 0.7199170030883423, 0.7038805041890686, 0.6788731622372104, 0.64591328542169, 0.6061815461069726, 0.5610267432268025, 0.5119138691552906, 0.46032794110916975, 0.4077539662033596, 0.3556375992681274, 0.3052450205826604, 0.2577024305616884, 0.21393481676710369, 0.1746001292236276, 0.14010959314624474, 0.11057572712877385, 0.08582868020007114, 0.06542672933651948, 0.04873479433640581, 0.03487701609889477, 0.022821459765064396, 0.011439409983895869, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.005721667561715918, 0.0449689855458949, 0.08893101739902065, 0.13746419804461318, 0.19022591227262747, 0.24667121039905493, 0.30610548480575106, 0.36754094275306776, 0.42986242949461123, 0.4916586003062767, 0.5514777126712833, 0.6077363354125331, 0.6587203970535883, 0.7027794706280279, 0.7383152350047645, 0.7638602773480453, 0.7781542289716459, 0.7801776822283781, 0.7692426483788086, 0.7450439442534996, 0.7076142017184802, 0.6574157716388517, 0.5953064199112776, 0.5225129829154233, 0.4406541001312699, 0.3516257544870717, 0.25753393194644153, 0.1606954827162166, 0.06348731824600296, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
Answers:
You can identify the start/stop of each cluster, compute the sum with np.ad.reduce
together with the cluster size, then assign towards the middle of the cluster:
test = np.array([1, 1, 1, 3, 5, 0, 0, 0, 15, 16, 2, 0, 99, 0, 1, 2, 3, 4, 5, 0, 10, 20, 30, 40, 0, 1])
# identify null values
m = test == 0
# get positions of null/non-null change
idx = np.flatnonzero(np.diff(np.r_[True, m]))
# set up output array
out = np.zeros_like(test)
# compute size of each non-null cluster
cluster_size = np.diff(np.r_[idx, len(test)])[::2]
# assign their sum to the middle point of ach cluster
out[idx[::2]+(cluster_size)//2] = np.add.reduceat(test, idx)[::2]
out
# array([ 0, 0, 11, 0, 0, 0, 0, 0, 0, 33, 0, 0, 99,
# 0, 0, 0, 15, 0, 0, 0, 0, 0, 100, 0, 0, 1])
Visual output on your test array:
import numpy as np
import math
test = np.array([0, 0, 1, 3, 5, 0, 0, 0, 15, 16, 2, 0, 0])
test2 = np.zeros(len(test))
ind = np.concatenate((np.argwhere(test).squeeze(), [0]))
p, S = 0, 0
for i in range(len(ind)):
now, prv = ind[i], ind[i-1]
S += test[prv]
p+=1
if now != prv+1:
test2[prv-math.floor(p/2)] = S
p, S = 0, 0
print(test2)
[ 0. 0. 0. 9. 0. 0. 0. 0. 0. 33. 0. 0. 0.]
I have a numpy array, it looks like this:
test = numpy.array([0, 0, 1, 3, 5, 0, 0, 0, 15, 16, 2, 0, 0])
I would like to get the sum of each "number-cluster":
It should be like this:
[0 0 0 9 0 0 0 0 0 33 0 0 0 ]
I am searching for pandas or numpy modul to do this. Are there any suggestions or options?
Edit:
To clearify my question:
I have this part of a real dataset (see below):
I am trying to sum all values between zeros and put the sum value in the middle of the number sequence, and values around the sum value should be turn in zero. I am doing this to get a stem plot, where all sticks are in the middle of each curve.
Full reproducible example:
numpy.array([0.14615127632512193, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.029740091488616338, 0.09063089178836162, 0.1380136511666047, 0.17187288438267243, 0.19248433518089703, 0.2003245168693058, 0.19614351292272647, 0.18088710080137402, 0.15564787198250443, 0.1216984367737226, 0.08039857005072917, 0.033215997285686215, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.025134855935682095, 0.10513055014366987, 0.18598353864085884, 0.26609407961465364, 0.34387046651092235, 0.4177887209889943, 0.4863527667154762, 0.5482299378679611, 0.6021788910022772, 0.6472089430838289, 0.6825618586582046, 0.7076575672260416, 0.7222015685948489, 0.726197524625621, 0.7199170030883423, 0.7038805041890686, 0.6788731622372104, 0.64591328542169, 0.6061815461069726, 0.5610267432268025, 0.5119138691552906, 0.46032794110916975, 0.4077539662033596, 0.3556375992681274, 0.3052450205826604, 0.2577024305616884, 0.21393481676710369, 0.1746001292236276, 0.14010959314624474, 0.11057572712877385, 0.08582868020007114, 0.06542672933651948, 0.04873479433640581, 0.03487701609889477, 0.022821459765064396, 0.011439409983895869, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.005721667561715918, 0.0449689855458949, 0.08893101739902065, 0.13746419804461318, 0.19022591227262747, 0.24667121039905493, 0.30610548480575106, 0.36754094275306776, 0.42986242949461123, 0.4916586003062767, 0.5514777126712833, 0.6077363354125331, 0.6587203970535883, 0.7027794706280279, 0.7383152350047645, 0.7638602773480453, 0.7781542289716459, 0.7801776822283781, 0.7692426483788086, 0.7450439442534996, 0.7076142017184802, 0.6574157716388517, 0.5953064199112776, 0.5225129829154233, 0.4406541001312699, 0.3516257544870717, 0.25753393194644153, 0.1606954827162166, 0.06348731824600296, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
You can identify the start/stop of each cluster, compute the sum with np.ad.reduce
together with the cluster size, then assign towards the middle of the cluster:
test = np.array([1, 1, 1, 3, 5, 0, 0, 0, 15, 16, 2, 0, 99, 0, 1, 2, 3, 4, 5, 0, 10, 20, 30, 40, 0, 1])
# identify null values
m = test == 0
# get positions of null/non-null change
idx = np.flatnonzero(np.diff(np.r_[True, m]))
# set up output array
out = np.zeros_like(test)
# compute size of each non-null cluster
cluster_size = np.diff(np.r_[idx, len(test)])[::2]
# assign their sum to the middle point of ach cluster
out[idx[::2]+(cluster_size)//2] = np.add.reduceat(test, idx)[::2]
out
# array([ 0, 0, 11, 0, 0, 0, 0, 0, 0, 33, 0, 0, 99,
# 0, 0, 0, 15, 0, 0, 0, 0, 0, 100, 0, 0, 1])
Visual output on your test array:
import numpy as np
import math
test = np.array([0, 0, 1, 3, 5, 0, 0, 0, 15, 16, 2, 0, 0])
test2 = np.zeros(len(test))
ind = np.concatenate((np.argwhere(test).squeeze(), [0]))
p, S = 0, 0
for i in range(len(ind)):
now, prv = ind[i], ind[i-1]
S += test[prv]
p+=1
if now != prv+1:
test2[prv-math.floor(p/2)] = S
p, S = 0, 0
print(test2)
[ 0. 0. 0. 9. 0. 0. 0. 0. 0. 33. 0. 0. 0.]