How to use a rule-based 'expert' for imitation learning?

Question:

I am currently training a PPO model for a simulation.
The PPO model fails to understand that certain conditions will lead to no reward.

These conditions that lead to no reward are very simple rules.
I was trying to use these rules to create an ‘expert’ that the PPO model could use for imitation learning.

Example of Expert-Based Rules:

If resource A is unavailable, then don’t select that resource.

If "X" & "Y" don’t match, then don’t select those.

Example with Imitations Library

I was looking at the "imitations" python library.
The example there shows an expert that is a PPO model with more iterations.

https://github.com/HumanCompatibleAI/imitation/blob/master/examples/1_train_bc.ipynb

enter image description here

Questions:

Is there a way to convert the simple "rule-based" expert into a PPO model which can be used for imitation learning?

Or is there a different approach to using a "rule-based" expert in imitation learning?

Asked By: narnia649

||

Answers:

Looking at how behavioural cloning is implemented:

from imitation.algorithms import bc

bc_trainer = bc.BC(
    observation_space=env.observation_space,
    action_space=env.action_space,
    demonstrations=transitions,
)

All you have to do is to create demonstrations. You do not even need to write "an agent" per se. Just generate sequences from interacting with your environment using your rule based bot, that’s all.

Answered By: lejlot

Can transitions be a numpy array of
state, reward, done, {} or it needs to be something different?

Answered By: Prathamesh Saraf