OpenAI gym: when is reset required?

Question:

Although I can manage to get the examples and my own code to run, I am more curious about the real semantics / expectations behind OpenAI gym API, in particular Env.reset()

When is reset expected/required? At the end of each episode? Or only after creating an environment?

I rather think it makes sense before each episode but I have not been able to read that explicitly!

Asked By: Juan Leni

||

Answers:

You typically use reset after an entire episode. So that could be after you reached a terminal state in the mdp, or after you reached you maximum amount of time steps (set by you). I also typically reset it at the very start of training as well.

So if you are at your starting state ‘A’ and you want to reach state ‘Z’, you would run your time steps going from ‘A’ -> ‘B’ -> ‘C’ …, then when you reach the terminal state ‘Z’, you start a new episode using reset, which would take you back to ‘A’.

    for episode in range(iterations):
        state = env.reset() // first state
        for time_step in range(1000):  //max amount of iterations
            action = take_action(state)
            state, reward, done, _ = env.step(action)
            if done:
                break // takes you to the next episode where the environment is reset
Answered By: Derek_M

Thing simply by using env.reset() it just reset whole things so you need to reset each episode

env.reset()

This is example for reset function inside a custom environment.
It just reset the enemy position and time in this case

I guess you got better understanding by showing what is inside environment

Sorry for late response

Answered By: suthakar Thiroshan