How can I change this to use a q table for reinforcement learning

Question:

I am working on learning q-tables and ran through a simple version which only used a 1-dimensional array to move forward and backward. now I am trying 4 direction movement and got stuck on controlling the person.

I got the random movement down now and it will eventually find the goal. but I want it to learn how to get to the goal instead of randomly stumbling on it. So I would appreciate any advice on adding a qlearning into this code. Thank you.

Here is my full code as it stupid simple right now.

import numpy as np
import random
import math

world = np.zeros((5,5))
print(world)
# Make sure that it can never be 0 i.e the start point
goal_x = random.randint(1,4)
goal_y = random.randint(1,4)
goal = (goal_x, goal_y)
print(goal)
world[goal] = 1
print(world)

LEFT = 0
RIGHT = 1
UP = 2
DOWN = 3
map_range_min = 0
map_range_max = 5

class Agent:
    def __init__(self, current_position, my_goal, world):
        self.current_position = current_position
        self.last_postion = current_position
        self.visited_positions = []
        self.goal = my_goal
        self.last_reward = 0
        self.totalReward = 0
        self.q_table = world


    # Update the totoal reward by the reward        
    def updateReward(self, extra_reward):
        # This will either increase or decrese the total reward for the episode
        x = (self.goal[0] - self.current_position[0]) **2
        y = (self.goal[1] - self.current_position[1]) **2
        dist = math.sqrt(x + y)
        complet_reward = dist + extra_reward
        self.totalReward += complet_reward 

    def validate_move(self):
        valid_move_set = []
        # Check for x ranges
        if map_range_min < self.current_position[0] < map_range_max:
            valid_move_set.append(LEFT)
            valid_move_set.append(RIGHT)
        elif map_range_min == self.current_position[0]:
            valid_move_set.append(RIGHT)
        else:
            valid_move_set.append(LEFT)
        # Check for Y ranges
        if map_range_min < self.current_position[1] < map_range_max:
            valid_move_set.append(UP)
            valid_move_set.append(DOWN)
        elif map_range_min == self.current_position[1]:
            valid_move_set.append(DOWN)
        else:
            valid_move_set.append(UP)
        return valid_move_set

    # Make the agent move
    def move_right(self):
        self.last_postion = self.current_position
        x = self.current_position[0]
        x += 1
        y = self.current_position[1]
        return (x, y)
    def move_left(self):
        self.last_postion = self.current_position
        x = self.current_position[0]
        x -= 1
        y = self.current_position[1]
        return (x, y)
    def move_down(self):
        self.last_postion = self.current_position
        x = self.current_position[0]
        y = self.current_position[1]
        y += 1
        return (x, y)
    def move_up(self):
        self.last_postion = self.current_position
        x = self.current_position[0]
        y = self.current_position[1]
        y -= 1
        return (x, y)

    def move_agent(self):
        move_set = self.validate_move()
        randChoice = random.randint(0, len(move_set)-1)
        move = move_set[randChoice]
        if move == UP:
            return self.move_up()
        elif move == DOWN:
            return self.move_down()
        elif move == RIGHT:
            return self.move_right()
        else:
            return self.move_left()

    # Update the rewards
    # Return True to kill the episode
    def checkPosition(self):
        if self.current_position == self.goal:
            print("Found Goal")
            self.updateReward(10)
            return False
        else:
            #Chose new direction
            self.current_position = self.move_agent()
            self.visited_positions.append(self.current_position)
            # Currently get nothing for not reaching the goal
            self.updateReward(0)
            return True


gus = Agent((0, 0) , goal)
play = gus.checkPosition()
while play:
    play = gus.checkPosition()

print(gus.totalReward)
Asked By: MNM

||

Answers:

I have a few suggestions based on your code example:

  1. separate the environment from the agent. The environment needs to have a method of the form new_state, reward = env.step(old_state, action). This method is saying how an action transforms your old state into a new state. It’s a good idea to encode your states and actions as simple integers. I strongly recommend setting up unit tests for this method.

  2. the agent then needs to have an equivalent method action = agent.policy(state, reward). As a first pass, you should manually code an agent that does what you think is right. e.g., it might just try to head towards the goal location.

  3. consider the issue of whether the state representation is Markovian. If you could do better at the problem if you had a memory of all the past states you visited, then the state doesn’t have the Markov property. Preferably, the state representation should be compact (the smallest set that is still Markovian).

  4. once this structure is set-up, you can then think about actually learning a Q table. One possible method (that is easy to understand but not necessarily that efficient) is Monte Carlo with either exploring starts or epsilon-soft greedy. A good RL book should give pseudocode for either variant.

When you are feeling confident, head to openai gym https://www.gymlibrary.dev/ for some more detailed class structures. There are some hints about creating your own environments here: https://www.gymlibrary.dev/content/environment_creation/

Answered By: James Brusey