build a list with different values in different indexes
Question:
I have a predictions probabilities for multiclasses [0, 1], and I need to save the correct prediction probability for each row in my data set:
probs = model.predict_proba(X_test)
idx_1 = np.where(y_test == 1)[0]
idx_0 = np.where(y_test == 0)[0]
probs
includes pairs of values, the first value in each pair is the probability for class 0 and the second is the probability for class 1 (both summing up to 1).
The shape of probs differs cause am calling it during cross_validation, but let’s take an example of one iteration: (568, 2) and for y_test: 568
I tried the following to save the correct probability for each labeled data:
probs_per_class = [probs[idx_1, 1][i] if i in idx_1 else probs[idx_0, 0][i] for i in range(len(y_test))]
but getting
out of bound
error
if I expand it to simulate verbus mode:
probs_per_class = []
for i in range(len(y_test)):
if i in idx_1:
probs_per_class.append(probs[idx_1, 1][i])
print(i, probs[idx_1, 1][i])
elif i in idx_0:
probs_per_class.append(probs[idx_0, 0][i])
print(i, probs[idx_0, 0][i])
failing on:
IndexError: index 194 is out of bounds for axis 0 with size 191
What is going wrong?
Answers:
The issue seems to be that you are indexing probs
with the indices from idx_1
and idx_0
, which are based on the indices of y_test
. However, the shape of probs
may be different from the shape of y_test
, so the indices from idx_1
and idx_0
may be out of bounds for probs
.
To fix this issue, you can use boolean indexing instead of integer indexing with the indices from idx_1
and idx_0
. Here’s an updated version of your code that should work:
probs_per_class = []
for i in range(len(y_test)):
if y_test[i] == 1:
probs_per_class.append(probs[i, 1])
print(i, probs[i, 1])
else:
probs_per_class.append(probs[i, 0])
print(i, probs[i, 0])
You explained you have a .shape
of (568, 2)
at one point,
but that does not seem to be relevant to the reported error.
for i in range(len(y_test)):
...
probs_per_class.append(probs[idx_1, 1][i])
What is going wrong?
The "IndexError: index 194 is out of bounds for axis 0 with size 191"
diagnostic suggests that y_test
has at least 194 elements,
yet probs
is not that big.
(It also isn’t clear just what the relationship between idx_1
and i
might be, but there’s room for either of them to be too big.)
Either make the related datastructures have the same .shape
,
or operate on a subset of them.
For example, rather than len(y_test)
you might
prefer len(probs)
.
Note that len(...)
is different from .shape
.
Consider iterating over an enumerate(...)
generator
instead of producing integer indexes,
so it will be much harder to accidentally fall off
the end of a datastructure that way.
I have a predictions probabilities for multiclasses [0, 1], and I need to save the correct prediction probability for each row in my data set:
probs = model.predict_proba(X_test)
idx_1 = np.where(y_test == 1)[0]
idx_0 = np.where(y_test == 0)[0]
probs
includes pairs of values, the first value in each pair is the probability for class 0 and the second is the probability for class 1 (both summing up to 1).
The shape of probs differs cause am calling it during cross_validation, but let’s take an example of one iteration: (568, 2) and for y_test: 568
I tried the following to save the correct probability for each labeled data:
probs_per_class = [probs[idx_1, 1][i] if i in idx_1 else probs[idx_0, 0][i] for i in range(len(y_test))]
but getting
out of bound
error
if I expand it to simulate verbus mode:
probs_per_class = []
for i in range(len(y_test)):
if i in idx_1:
probs_per_class.append(probs[idx_1, 1][i])
print(i, probs[idx_1, 1][i])
elif i in idx_0:
probs_per_class.append(probs[idx_0, 0][i])
print(i, probs[idx_0, 0][i])
failing on:
IndexError: index 194 is out of bounds for axis 0 with size 191
What is going wrong?
The issue seems to be that you are indexing probs
with the indices from idx_1
and idx_0
, which are based on the indices of y_test
. However, the shape of probs
may be different from the shape of y_test
, so the indices from idx_1
and idx_0
may be out of bounds for probs
.
To fix this issue, you can use boolean indexing instead of integer indexing with the indices from idx_1
and idx_0
. Here’s an updated version of your code that should work:
probs_per_class = []
for i in range(len(y_test)):
if y_test[i] == 1:
probs_per_class.append(probs[i, 1])
print(i, probs[i, 1])
else:
probs_per_class.append(probs[i, 0])
print(i, probs[i, 0])
You explained you have a .shape
of (568, 2)
at one point,
but that does not seem to be relevant to the reported error.
for i in range(len(y_test)):
...
probs_per_class.append(probs[idx_1, 1][i])
What is going wrong?
The "IndexError: index 194 is out of bounds for axis 0 with size 191"
diagnostic suggests that y_test
has at least 194 elements,
yet probs
is not that big.
(It also isn’t clear just what the relationship between idx_1
and i
might be, but there’s room for either of them to be too big.)
Either make the related datastructures have the same .shape
,
or operate on a subset of them.
For example, rather than len(y_test)
you might
prefer len(probs)
.
Note that len(...)
is different from .shape
.
Consider iterating over an enumerate(...)
generator
instead of producing integer indexes,
so it will be much harder to accidentally fall off
the end of a datastructure that way.