what does "IndexError: index 20 is out of bounds for axis 1 with size 20"

Question:

I was working on q learning in a maze environment, However, at the initial stage, it was working fine but afterward, I was getting the following
max_future_q = np.max(q_table[new_discrete_state])
IndexError: index 20 is out of bounds for axis 1 with size 20

I am not understanding what is the issue here
Below is the code:

enter code here
import gym
import numpy as np
import gym_maze

env = gym.make("maze-v0")

LEARNING_RATE = 0.1

DISCOUNT = 0.95
EPISODES = 25000
SHOW_EVERY = 3000

DISCRETE_OS_SIZE = [20, 20]
discrete_os_win_size = (env.observation_space.high - env.observation_space.low)/DISCRETE_OS_SIZE

# Exploration settings
epsilon = 1  # not a constant, qoing to be decayed
START_EPSILON_DECAYING = 1
END_EPSILON_DECAYING = EPISODES//2
epsilon_decay_value = epsilon/(END_EPSILON_DECAYING - START_EPSILON_DECAYING)


q_table = np.random.uniform(low=-2, high=0, size=(DISCRETE_OS_SIZE + [env.action_space.n]))


def get_discrete_state(state):
    discrete_state = (state - env.observation_space.low)/discrete_os_win_size
    return tuple(discrete_state.astype(np.int))  # we use this tuple to look up the 3 Q values for the available actions in the q-table


for episode in range(EPISODES):
    discrete_state = get_discrete_state(env.reset())
    done = False

    if episode % SHOW_EVERY == 0:
        render = True
        print(episode)
    else:
        render = False

    while not done:

        if np.random.random() > epsilon:
            # Get action from Q table
            action = np.argmax(q_table[discrete_state])
        else:
            # Get random action
            action = np.random.randint(0, env.action_space.n)


        new_state, reward, done, _ = env.step(action)

        new_discrete_state = get_discrete_state(new_state)

        if episode % SHOW_EVERY == 0:
            env.render()
        #new_q = (1 - LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)

        # If simulation did not end yet after last step - update Q table
        if not done:

            # Maximum possible Q value in next step (for new state)
            max_future_q = np.max(q_table[new_discrete_state])

            # Current Q value (for current state and performed action)
            current_q = q_table[discrete_state + (action,)]

            # And here's our equation for a new Q value for current state and action
            new_q = (1 - LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)

            # Update Q table with new Q value
            q_table[discrete_state + (action,)] = new_q


        # Simulation ended (for any reson) - if goal position is achived - update Q value with reward directly
        elif new_state[0] >= env.goal_position:
            #q_table[discrete_state + (action,)] = reward
            q_table[discrete_state + (action,)] = 0

        discrete_state = new_discrete_state

    # Decaying is being done every episode if episode number is within decaying range
    if END_EPSILON_DECAYING >= episode >= START_EPSILON_DECAYING:
        epsilon -= epsilon_decay_value


env.close()

    
Asked By: Sherin shibu

||

Answers:

The error means that you trying to index an array with shape (n,20) axis 1 with size 20, and with 20. e.g np.zeros((10,20))[:,20]
Try to verify the size of your np arrays and your indices

Answered By: Benjaminliupenrose

Index out of bounds error means you are trying to access an item that is located at an index that does not exist in the container. You cannot select the sixth person in a line of five people.

Python, like most programming languages, is 0-indexed. This means that the first item in a container has the index 0, not 1. So the indexes of the items in a container of size 5 would be

0, 1, 2, 3, 4

As you can see, the index of the last item in a container is 1 less than the size of the container. In python you can get the index of the last item in a container with

len(foo) - 1
Answered By: segj

by printing

env.reset()

we get a tuple like this ((array([-0.4530919, 0. ], dtype=float32), {})
thus we need to take the 0th index of the tuple
to get the state array= array([-0.4530919, 0. ]
and this step must be done only before the entering the "for"-loop at the line:

discrete_state = get_discrete_state(env.reset())

this line must be modified to:

discrete_state = get_discrete_state(env.reset()[0])

and then the subtraction will be right for the rest of the discrete_state term in the function "get_discrete_state" when the "for"-loop is entered, and this error will never appear again.

Answered By: Walid Hamad