Why there are no logs and which model is saved?

Question:

I’m using Trainer to train my model.

I have the following outputs on screen:

Epoch   Training Loss   Validation Loss Accuracy
0   No log  1.114260    0.342667
1   No log  0.939480    0.545333
2   No log  0.816581    0.660000
3   No log  0.752204    0.710667
4   No log  0.741462    0.741333
5   No log  0.801005    0.754667
6   0.675800    0.892765    0.748000
7   0.675800    1.190328    0.752000
8   0.675800    1.272624    0.745333
  1. Why there are no logs for epochs 0-5 ? (Do I need to configure / enable them ?)
  2. Epoch #5 got best accuracy.
    When I will use predict, which model checkpoint will be used ?
    (the model which trained after 5 epochs or the model which trained after 8 epochs) ?
Asked By: user3668129

||

Answers:

  1. The default logging_steps parameter in TrainingArguments() is
    the value 500. No loss gets reported before 500 steps.

    Since you display in epochs now, I can only assume that 1st epoch is equal to 100 steps, starting from 0 steps and once it reaches the 6th epoch is starts to display the logs.

  2. There are additional parameters you can specify in TrainingArguments(). For example, if you provide the parameters like below, you both get saved only the best models according to the metric you want to optimize for and also have the best model at the end of the training.

Example:

args = TrainingArguments( ..., 
                          # Must be 2, current one and best one
                          eval_steps = 100,  
                          save_total_limit = 2, 
                          metric_for_best_model = 'accuracy',   
                          greater_is_bettter = True, 
                          load_best_model_at_end = True)

In this situation, every 100 steps the model is evaluated on the validation set taking the accuracy as the optimizing metric.

Answered By: Timbus Calin