How can I evaluate multiple checkpoints with the TF2 Object Detection?
Question:
I’ve successfully trained a model with around 16k steps which produced quite a few checkpoints that are saved in my training
folder. I want to make sure that I am not running into overfitting issues, so I would like to evaluate every single checkpoint with my testing data.
I am using the following command from the official Tensorflow 2 Object Detection repository:
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
CHECKPOINT_DIR=${MODEL_DIR}
python object_detection/model_main_tf2.py
--pipeline_config_path=${PIPELINE_CONFIG_PATH}
--model_dir=${MODEL_DIR}
--checkpoint_dir=${CHECKPOINT_DIR}
--alsologtostderr
MODEL_DIR
and CHECKPOINT_DIR
are both pointing to my training
folder.
The issue I am experiencing now is that this only evaluates the latest checkpoint, but I’d like to evaluate all of them at once.
Ideally I would like to see the results in TensorBoard which shows the val_accuracy (mAP) of the different checkpoints as graph – which it does already, but just for the one checkpoint.
Answers:
As of 02.2022
The Validation process is supposed to run at the same time with the Training process so that whenever a new Checkpoint is saved, the Validation process immediately loads the Checkpoint and starts validating.
Please see my other answer in this regard.
I’ve successfully trained a model with around 16k steps which produced quite a few checkpoints that are saved in my training
folder. I want to make sure that I am not running into overfitting issues, so I would like to evaluate every single checkpoint with my testing data.
I am using the following command from the official Tensorflow 2 Object Detection repository:
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
CHECKPOINT_DIR=${MODEL_DIR}
python object_detection/model_main_tf2.py
--pipeline_config_path=${PIPELINE_CONFIG_PATH}
--model_dir=${MODEL_DIR}
--checkpoint_dir=${CHECKPOINT_DIR}
--alsologtostderr
MODEL_DIR
and CHECKPOINT_DIR
are both pointing to my training
folder.
The issue I am experiencing now is that this only evaluates the latest checkpoint, but I’d like to evaluate all of them at once.
Ideally I would like to see the results in TensorBoard which shows the val_accuracy (mAP) of the different checkpoints as graph – which it does already, but just for the one checkpoint.
As of 02.2022
The Validation process is supposed to run at the same time with the Training process so that whenever a new Checkpoint is saved, the Validation process immediately loads the Checkpoint and starts validating.
Please see my other answer in this regard.