build a list with different values in different indexes

Question:

I have a predictions probabilities for multiclasses [0, 1], and I need to save the correct prediction probability for each row in my data set:

probs = model.predict_proba(X_test)
idx_1 = np.where(y_test == 1)[0]
idx_0 = np.where(y_test == 0)[0]

probs includes pairs of values, the first value in each pair is the probability for class 0 and the second is the probability for class 1 (both summing up to 1).

The shape of probs differs cause am calling it during cross_validation, but let’s take an example of one iteration: (568, 2) and for y_test: 568

I tried the following to save the correct probability for each labeled data:

probs_per_class = [probs[idx_1, 1][i] if i in idx_1 else probs[idx_0, 0][i] for i in range(len(y_test))]

but getting

out of bound

error

if I expand it to simulate verbus mode:

        probs_per_class = []
        for i in range(len(y_test)):
            if i in idx_1:
                probs_per_class.append(probs[idx_1, 1][i])
                print(i, probs[idx_1, 1][i])
            elif i in idx_0:
                probs_per_class.append(probs[idx_0, 0][i])
                print(i, probs[idx_0, 0][i])

failing on:

IndexError: index 194 is out of bounds for axis 0 with size 191

What is going wrong?

Asked By: zbeedatm

||

Answers:

The issue seems to be that you are indexing probs with the indices from idx_1 and idx_0, which are based on the indices of y_test. However, the shape of probs may be different from the shape of y_test, so the indices from idx_1 and idx_0 may be out of bounds for probs.

To fix this issue, you can use boolean indexing instead of integer indexing with the indices from idx_1 and idx_0. Here’s an updated version of your code that should work:

probs_per_class = []
for i in range(len(y_test)):
    if y_test[i] == 1:
        probs_per_class.append(probs[i, 1])
        print(i, probs[i, 1])
    else:
        probs_per_class.append(probs[i, 0])
        print(i, probs[i, 0])
Answered By: Rocketq

You explained you have a .shape of (568, 2) at one point,
but that does not seem to be relevant to the reported error.


        for i in range(len(y_test)):
                ...
                probs_per_class.append(probs[idx_1, 1][i])

What is going wrong?

The "IndexError: index 194 is out of bounds for axis 0 with size 191"
diagnostic suggests that y_test has at least 194 elements,
yet probs is not that big.
(It also isn’t clear just what the relationship between idx_1 and i
might be, but there’s room for either of them to be too big.)

Either make the related datastructures have the same .shape,
or operate on a subset of them.
For example, rather than len(y_test) you might
prefer len(probs).

Note that len(...) is different from .shape.

Consider iterating over an enumerate(...) generator
instead of producing integer indexes,
so it will be much harder to accidentally fall off
the end of a datastructure that way.

Answered By: J_H
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.