Why there are no logs and which model is saved?
Question:
I’m using Trainer
to train my model.
I have the following outputs on screen:
Epoch Training Loss Validation Loss Accuracy
0 No log 1.114260 0.342667
1 No log 0.939480 0.545333
2 No log 0.816581 0.660000
3 No log 0.752204 0.710667
4 No log 0.741462 0.741333
5 No log 0.801005 0.754667
6 0.675800 0.892765 0.748000
7 0.675800 1.190328 0.752000
8 0.675800 1.272624 0.745333
- Why there are no logs for epochs 0-5 ? (Do I need to configure / enable them ?)
- Epoch #5 got best accuracy.
When I will use predict, which model checkpoint will be used ?
(the model which trained after 5 epochs or the model which trained after 8 epochs) ?
Answers:
-
The default logging_steps
parameter in TrainingArguments()
is
the value 500
. No loss gets reported before 500 steps.
Since you display in epochs now, I can only assume that 1st
epoch is equal to 100
steps, starting from 0
steps and once it reaches the 6th
epoch is starts to display the logs.
-
There are additional parameters you can specify in TrainingArguments()
. For example, if you provide the parameters like below, you both get saved only the best models according to the metric you want to optimize for and also have the best model at the end of the training.
Example:
args = TrainingArguments( ...,
# Must be 2, current one and best one
eval_steps = 100,
save_total_limit = 2,
metric_for_best_model = 'accuracy',
greater_is_bettter = True,
load_best_model_at_end = True)
In this situation, every 100 steps the model is evaluated on the validation set taking the accuracy as the optimizing metric.
I’m using Trainer
to train my model.
I have the following outputs on screen:
Epoch Training Loss Validation Loss Accuracy
0 No log 1.114260 0.342667
1 No log 0.939480 0.545333
2 No log 0.816581 0.660000
3 No log 0.752204 0.710667
4 No log 0.741462 0.741333
5 No log 0.801005 0.754667
6 0.675800 0.892765 0.748000
7 0.675800 1.190328 0.752000
8 0.675800 1.272624 0.745333
- Why there are no logs for epochs 0-5 ? (Do I need to configure / enable them ?)
- Epoch #5 got best accuracy.
When I will use predict, which model checkpoint will be used ?
(the model which trained after 5 epochs or the model which trained after 8 epochs) ?
-
The default
logging_steps
parameter inTrainingArguments()
is
the value500
. No loss gets reported before 500 steps.Since you display in epochs now, I can only assume that
1st
epoch is equal to100
steps, starting from0
steps and once it reaches the6th
epoch is starts to display the logs. -
There are additional parameters you can specify in
TrainingArguments()
. For example, if you provide the parameters like below, you both get saved only the best models according to the metric you want to optimize for and also have the best model at the end of the training.
Example:
args = TrainingArguments( ...,
# Must be 2, current one and best one
eval_steps = 100,
save_total_limit = 2,
metric_for_best_model = 'accuracy',
greater_is_bettter = True,
load_best_model_at_end = True)
In this situation, every 100 steps the model is evaluated on the validation set taking the accuracy as the optimizing metric.