Python/Numpy get average of array based on index

Question:

I have two numpy arrays, the first one is the values and the second one is the indexes. What I want to do is to get the average of the values array based on the indexes array.

For example:

values = [1,2,3,4,5]
indexes = [0,0,1,1,2]
get_indexed_avg(values, indexes)
# should give me 
#   [1.5,    3.5,    5]

Here, the values in the indexes array represent the indexes in the final array. Hence:

  1. First two items in the values array are being averaged to form the zero index in the final array.
  2. The 3rd and the 4th item in the values array are being averaged to form the first index in the final array.
  3. Finally the last item is being used to for the 2nd index in the final array.

I do have a python solution to this. But that is just horrible and very slow. Is there a better solution to this? maybe using numpy? or other such libraries.

Asked By: Prasanna

||

Answers:

import pandas as pd
pd.Series(values).groupby(indexes).mean()
# OR
# pd.Series(values).groupby(indexes).mean().to_list()
# 0    1.5
# 1    3.5
# 2    5.0
# dtype: float64
Answered By: d.b

I wanted to avoid pandas so I spent quite some time figuring it out.
The way to do this is by using what’s called a one-hot encoding.

Creating a one-hot encoding of the indexes will give us a 2-d array with 1s at places where we want them. For example:

indexes = np.array([0,0,1,1,2])
# one_hot = array(
#    [[1., 0., 0.],
#    [1., 0., 0.],
#    [0., 1., 0.],
#    [0., 1., 0.],
#    [0., 0., 1.]]
# )

We just need to get a one-hot for the index array and mat-multiply it with the values to get what we want. Uses answer from this post

values = np.array([1,2,3,4,5])
indexes = np.array([0,0,1,1,2])

one_hot = np.eye(np.max(indexes) + 1)[indexes]

counts = np.sum(one_hot, axis=0)
average = np.sum((one_hot.T * values), axis=1) / counts

print(average) # [1.5 3.5 5.]
Answered By: Prasanna

The simplest and easy solution:

values = np.array([1,2,3,4,5])
indexes = np.array([0,0,1,1,2])
index_set = set(indexes) # index_set = {0, 1, 2}

# Now get values based on the index that we saved in index_set 
# and then take an average
avg = [np.mean(values[indexes==k]) for k in index_set]

print(avg) # [1.5, 3.5, 5.0]
Answered By: Faisal Hussain
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.