Should GPU memory be increasing substantially after every epoch in Tensorflow?

Question:

I’ve got a Keras subclassed model in Tensorflow which stays at a constant GPU memory usage throughout an epoch, then when starting a new epoch it appears to be allocating a whole new set of memory for that epoch.

Is this normal, expected behaviour?

Currently, I’m getting OOM on only my third epoch, and I’m not sure what sort of data needs to be retained after the previous epoch other than the loss. If it is expected behaviour, what large quantity data exactly does need to be retained (e.g. does Tensorflow need to store historic weights for some reason?)

I’ve not included any code as I’m asking this as more of a general question about Tensorflow and CNN model behaviour.

Asked By: magmacollaris

Source

Answers:

My instinct is that you might see increases in the first two epochs but you should generally have steady state after that.

Off-handedly, you might want to compare weights between epochs and so get 2N memory that way.

Maybe there’s an out of control snapshot mechanism?

Answered By: Richard