what does "IndexError: index 20 is out of bounds for axis 1 with size 20"
Question:
I was working on q learning in a maze environment, However, at the initial stage, it was working fine but afterward, I was getting the following
max_future_q = np.max(q_table[new_discrete_state])
IndexError: index 20 is out of bounds for axis 1 with size 20
I am not understanding what is the issue here
Below is the code:
enter code here
import gym
import numpy as np
import gym_maze
env = gym.make("maze-v0")
LEARNING_RATE = 0.1
DISCOUNT = 0.95
EPISODES = 25000
SHOW_EVERY = 3000
DISCRETE_OS_SIZE = [20, 20]
discrete_os_win_size = (env.observation_space.high - env.observation_space.low)/DISCRETE_OS_SIZE
# Exploration settings
epsilon = 1 # not a constant, qoing to be decayed
START_EPSILON_DECAYING = 1
END_EPSILON_DECAYING = EPISODES//2
epsilon_decay_value = epsilon/(END_EPSILON_DECAYING - START_EPSILON_DECAYING)
q_table = np.random.uniform(low=-2, high=0, size=(DISCRETE_OS_SIZE + [env.action_space.n]))
def get_discrete_state(state):
discrete_state = (state - env.observation_space.low)/discrete_os_win_size
return tuple(discrete_state.astype(np.int)) # we use this tuple to look up the 3 Q values for the available actions in the q-table
for episode in range(EPISODES):
discrete_state = get_discrete_state(env.reset())
done = False
if episode % SHOW_EVERY == 0:
render = True
print(episode)
else:
render = False
while not done:
if np.random.random() > epsilon:
# Get action from Q table
action = np.argmax(q_table[discrete_state])
else:
# Get random action
action = np.random.randint(0, env.action_space.n)
new_state, reward, done, _ = env.step(action)
new_discrete_state = get_discrete_state(new_state)
if episode % SHOW_EVERY == 0:
env.render()
#new_q = (1 - LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)
# If simulation did not end yet after last step - update Q table
if not done:
# Maximum possible Q value in next step (for new state)
max_future_q = np.max(q_table[new_discrete_state])
# Current Q value (for current state and performed action)
current_q = q_table[discrete_state + (action,)]
# And here's our equation for a new Q value for current state and action
new_q = (1 - LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)
# Update Q table with new Q value
q_table[discrete_state + (action,)] = new_q
# Simulation ended (for any reson) - if goal position is achived - update Q value with reward directly
elif new_state[0] >= env.goal_position:
#q_table[discrete_state + (action,)] = reward
q_table[discrete_state + (action,)] = 0
discrete_state = new_discrete_state
# Decaying is being done every episode if episode number is within decaying range
if END_EPSILON_DECAYING >= episode >= START_EPSILON_DECAYING:
epsilon -= epsilon_decay_value
env.close()
Answers:
The error means that you trying to index an array with shape (n,20) axis 1 with size 20
, and with 20
. e.g np.zeros((10,20))[:,20]
Try to verify the size of your np arrays and your indices
Index out of bounds error means you are trying to access an item that is located at an index that does not exist in the container. You cannot select the sixth person in a line of five people.
Python, like most programming languages, is 0-indexed. This means that the first item in a container has the index 0, not 1. So the indexes of the items in a container of size 5 would be
0, 1, 2, 3, 4
As you can see, the index of the last item in a container is 1 less than the size of the container. In python you can get the index of the last item in a container with
len(foo) - 1
by printing
env.reset()
we get a tuple like this ((array([-0.4530919, 0. ], dtype=float32), {})
thus we need to take the 0th index of the tuple
to get the state array= array([-0.4530919, 0. ]
and this step must be done only before the entering the "for"-loop at the line:
discrete_state = get_discrete_state(env.reset())
this line must be modified to:
discrete_state = get_discrete_state(env.reset()[0])
and then the subtraction will be right for the rest of the discrete_state
term in the function "get_discrete_state
" when the "for"-loop is entered, and this error will never appear again.
I was working on q learning in a maze environment, However, at the initial stage, it was working fine but afterward, I was getting the following
max_future_q = np.max(q_table[new_discrete_state])
IndexError: index 20 is out of bounds for axis 1 with size 20
I am not understanding what is the issue here
Below is the code:
enter code here
import gym
import numpy as np
import gym_maze
env = gym.make("maze-v0")
LEARNING_RATE = 0.1
DISCOUNT = 0.95
EPISODES = 25000
SHOW_EVERY = 3000
DISCRETE_OS_SIZE = [20, 20]
discrete_os_win_size = (env.observation_space.high - env.observation_space.low)/DISCRETE_OS_SIZE
# Exploration settings
epsilon = 1 # not a constant, qoing to be decayed
START_EPSILON_DECAYING = 1
END_EPSILON_DECAYING = EPISODES//2
epsilon_decay_value = epsilon/(END_EPSILON_DECAYING - START_EPSILON_DECAYING)
q_table = np.random.uniform(low=-2, high=0, size=(DISCRETE_OS_SIZE + [env.action_space.n]))
def get_discrete_state(state):
discrete_state = (state - env.observation_space.low)/discrete_os_win_size
return tuple(discrete_state.astype(np.int)) # we use this tuple to look up the 3 Q values for the available actions in the q-table
for episode in range(EPISODES):
discrete_state = get_discrete_state(env.reset())
done = False
if episode % SHOW_EVERY == 0:
render = True
print(episode)
else:
render = False
while not done:
if np.random.random() > epsilon:
# Get action from Q table
action = np.argmax(q_table[discrete_state])
else:
# Get random action
action = np.random.randint(0, env.action_space.n)
new_state, reward, done, _ = env.step(action)
new_discrete_state = get_discrete_state(new_state)
if episode % SHOW_EVERY == 0:
env.render()
#new_q = (1 - LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)
# If simulation did not end yet after last step - update Q table
if not done:
# Maximum possible Q value in next step (for new state)
max_future_q = np.max(q_table[new_discrete_state])
# Current Q value (for current state and performed action)
current_q = q_table[discrete_state + (action,)]
# And here's our equation for a new Q value for current state and action
new_q = (1 - LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)
# Update Q table with new Q value
q_table[discrete_state + (action,)] = new_q
# Simulation ended (for any reson) - if goal position is achived - update Q value with reward directly
elif new_state[0] >= env.goal_position:
#q_table[discrete_state + (action,)] = reward
q_table[discrete_state + (action,)] = 0
discrete_state = new_discrete_state
# Decaying is being done every episode if episode number is within decaying range
if END_EPSILON_DECAYING >= episode >= START_EPSILON_DECAYING:
epsilon -= epsilon_decay_value
env.close()
The error means that you trying to index an array with shape (n,20) axis 1 with size 20
, and with 20
. e.g np.zeros((10,20))[:,20]
Try to verify the size of your np arrays and your indices
Index out of bounds error means you are trying to access an item that is located at an index that does not exist in the container. You cannot select the sixth person in a line of five people.
Python, like most programming languages, is 0-indexed. This means that the first item in a container has the index 0, not 1. So the indexes of the items in a container of size 5 would be
0, 1, 2, 3, 4
As you can see, the index of the last item in a container is 1 less than the size of the container. In python you can get the index of the last item in a container with
len(foo) - 1
by printing
env.reset()
we get a tuple like this ((array([-0.4530919, 0. ], dtype=float32), {})
thus we need to take the 0th index of the tuple
to get the state array= array([-0.4530919, 0. ]
and this step must be done only before the entering the "for"-loop at the line:
discrete_state = get_discrete_state(env.reset())
this line must be modified to:
discrete_state = get_discrete_state(env.reset()[0])
and then the subtraction will be right for the rest of the discrete_state
term in the function "get_discrete_state
" when the "for"-loop is entered, and this error will never appear again.