Getting the position of the max element from a 2D array except for a specific column

Question:

I am trying to implement Q-Learning, and I need to get the position of the max element from a 2D array (the so-called Q-Value matrix) except for the sixth column (column index 5), similar to this question. I tried to implement the code in this answer. My implementation, and the output, is as follows:

        print("The current self.q_values is")
        print(self.q_values)
        col = 1
        print("col is: " + str(col))
        row, column = np.unravel_index(self.q_values[:, :col].argmax(), self.q_values[:, :col].shape) # left max, saving a line assuming its the global max
        right_max = np.unravel_index(self.q_values[:, col + 1:].argmax(), self.q_values[:, col + 1].shape)
        if self.q_values[right_max] > self.q_values[row, column]:
            row, column = right_max
            column += col
        self.current_action = np.argmax(self.q_values[row, column])



The current self.q_values is
[[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]
col is: 1
Traceback (most recent call last):
    if self.q_values[right_max] > self.q_values[row, column]:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The Q-Value matrix was initialised as follows:

# Q-values for each combination of state and action.
# rows represent states, columns represent actions
self.q_values = np.zeros((100, 5))

I’m struggling to understand exactly what the code is doing, so I can’t figure out how to do it correctly. I would usually just use the following:

        self.current_action = np.argmax(self.q_values[self.current_state - 1])

But, in cases where np.argmax(self.q_values[self.current_state - 1]) is 5 (in other words, the current action is that which is represented by column 5 of q_values), I need to abandon column 5 and just check columns 0 to 4. What is the correct way to do this?

Asked By: The Pointer

||

Answers:

Knowing what follows you will be able to understand what the code actually does:

Numpy argmax method returns the index of the first found maximum value in an array q_values as index to the flattened array ( see numpy docu on argmax ). So if you want the row, column indices to the max value in the array q_values it is necessary to use the np.unravel_index method to obtain them from the index to the flattened array.

With this above you can then see yourself that the error message you have got was caused by a missing colon in self.q_values[:, col + 1].shape (should be self.q_values[:, col + 1:].shape).

In your special case of skipping the very last column there is no need to split the search to a left and right side of the column to skip, so using:

        col = 5
        row, column = np.unravel_index(self.q_values[:,:col].argmax(), self.q_values[:, :col].shape)

will give you already the required indices.

But, in cases where np.argmax(self.q_values[self.current_state – 1]) is 5 (in other words, the current action is that which is represented by column 5 of q_values), I need to abandon column 5 and just check columns 0 to 4. What is the correct way to do this?

np.argmax(self.q_values[self.current_state - 1, :5])
Answered By: Claudio
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.