Getting the position of the max element from a 2D array except for a specific column
Question:
I am trying to implement Q-Learning, and I need to get the position of the max element from a 2D array (the so-called Q-Value matrix) except for the sixth column (column index 5), similar to this question. I tried to implement the code in this answer. My implementation, and the output, is as follows:
print("The current self.q_values is")
print(self.q_values)
col = 1
print("col is: " + str(col))
row, column = np.unravel_index(self.q_values[:, :col].argmax(), self.q_values[:, :col].shape) # left max, saving a line assuming its the global max
right_max = np.unravel_index(self.q_values[:, col + 1:].argmax(), self.q_values[:, col + 1].shape)
if self.q_values[right_max] > self.q_values[row, column]:
row, column = right_max
column += col
self.current_action = np.argmax(self.q_values[row, column])
The current self.q_values is
[[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]]
col is: 1
Traceback (most recent call last):
if self.q_values[right_max] > self.q_values[row, column]:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
The Q-Value matrix was initialised as follows:
# Q-values for each combination of state and action.
# rows represent states, columns represent actions
self.q_values = np.zeros((100, 5))
I’m struggling to understand exactly what the code is doing, so I can’t figure out how to do it correctly. I would usually just use the following:
self.current_action = np.argmax(self.q_values[self.current_state - 1])
But, in cases where np.argmax(self.q_values[self.current_state - 1])
is 5 (in other words, the current action is that which is represented by column 5 of q_values), I need to abandon column 5 and just check columns 0 to 4. What is the correct way to do this?
Answers:
Knowing what follows you will be able to understand what the code actually does:
Numpy argmax
method returns the index of the first found maximum value in an array q_values
as index to the flattened array ( see numpy docu on argmax ). So if you want the row, column indices to the max value in the array q_values
it is necessary to use the np.unravel_index
method to obtain them from the index to the flattened array.
With this above you can then see yourself that the error message you have got was caused by a missing colon in self.q_values[:, col + 1].shape
(should be self.q_values[:, col + 1:].shape
).
In your special case of skipping the very last column there is no need to split the search to a left and right side of the column to skip, so using:
col = 5
row, column = np.unravel_index(self.q_values[:,:col].argmax(), self.q_values[:, :col].shape)
will give you already the required indices.
But, in cases where np.argmax(self.q_values[self.current_state – 1]) is 5 (in other words, the current action is that which is represented by column 5 of q_values), I need to abandon column 5 and just check columns 0 to 4. What is the correct way to do this?
np.argmax(self.q_values[self.current_state - 1, :5])
I am trying to implement Q-Learning, and I need to get the position of the max element from a 2D array (the so-called Q-Value matrix) except for the sixth column (column index 5), similar to this question. I tried to implement the code in this answer. My implementation, and the output, is as follows:
print("The current self.q_values is")
print(self.q_values)
col = 1
print("col is: " + str(col))
row, column = np.unravel_index(self.q_values[:, :col].argmax(), self.q_values[:, :col].shape) # left max, saving a line assuming its the global max
right_max = np.unravel_index(self.q_values[:, col + 1:].argmax(), self.q_values[:, col + 1].shape)
if self.q_values[right_max] > self.q_values[row, column]:
row, column = right_max
column += col
self.current_action = np.argmax(self.q_values[row, column])
The current self.q_values is
[[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0.]]
col is: 1
Traceback (most recent call last):
if self.q_values[right_max] > self.q_values[row, column]:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
The Q-Value matrix was initialised as follows:
# Q-values for each combination of state and action.
# rows represent states, columns represent actions
self.q_values = np.zeros((100, 5))
I’m struggling to understand exactly what the code is doing, so I can’t figure out how to do it correctly. I would usually just use the following:
self.current_action = np.argmax(self.q_values[self.current_state - 1])
But, in cases where np.argmax(self.q_values[self.current_state - 1])
is 5 (in other words, the current action is that which is represented by column 5 of q_values), I need to abandon column 5 and just check columns 0 to 4. What is the correct way to do this?
Knowing what follows you will be able to understand what the code actually does:
Numpy argmax
method returns the index of the first found maximum value in an array q_values
as index to the flattened array ( see numpy docu on argmax ). So if you want the row, column indices to the max value in the array q_values
it is necessary to use the np.unravel_index
method to obtain them from the index to the flattened array.
With this above you can then see yourself that the error message you have got was caused by a missing colon in self.q_values[:, col + 1].shape
(should be self.q_values[:, col + 1:].shape
).
In your special case of skipping the very last column there is no need to split the search to a left and right side of the column to skip, so using:
col = 5
row, column = np.unravel_index(self.q_values[:,:col].argmax(), self.q_values[:, :col].shape)
will give you already the required indices.
But, in cases where np.argmax(self.q_values[self.current_state – 1]) is 5 (in other words, the current action is that which is represented by column 5 of q_values), I need to abandon column 5 and just check columns 0 to 4. What is the correct way to do this?
np.argmax(self.q_values[self.current_state - 1, :5])