TensorFlow: How and why to use SavedModel

Question:

I have a few questions regarding the SavedModel API, whose documentation I find leaves a lot of details unexplained.

The first three questions are about what to pass to the arguments of the add_meta_graph_and_variables() method of tf.saved_model.builder.SavedModelBuilder, while the fourth question is about why to use the SavedModel API over tf.train.Saver.

  1. What is the format of the signature_def_map argument? Do I normally need to set this argument when saving a model?

  2. Similarly, What is the format of the assets_collection argument?

  3. Why do you save a list of tags with a metagraph as opposed to just giving it a name (i.e. attaching just one unique tag to it)? Why would I add multiple tags to a given metagraph? What if I try to load a metagrpah from a pb by a certain tag, but multiple metagraphs in that pb match that tag?

  4. The documentation argues that it is recommended to use SavedModel to save entire models (as opposed to variables only) in self-contained files. But tf.train.Saver also saves the graph in addition to the variables in a .meta file. So what are the advantages of using SavedModel? The documentation says

When you want to save and load variables, the graph, and the graph’s
metadata–basically, when you want to save or restore your model–we
recommend using SavedModel. SavedModel is a language-neutral,
recoverable, hermetic serialization format. SavedModel enables
higher-level systems and tools to produce, consume, and transform
TensorFlow models.

but this explanation is quite abstract and doesn’t really help me understand what the advantages of SavedModel are. What would be concrete examples where SavedModel (as opposed to tf.train.Saver) would be better to use?

Please note that my question is not a duplicate of this question. I’m not asking how to save a model, I am asking very specific questions about the properties of SavedModel, which is only one of multiple mechanisms TensorFlow provides to save and load models. None of the answers in the linked question touch on the SavedModel API (which, once again, is not the same as tf.train.Saver).

Asked By: Alex

||

Answers:

EDIT: I wrote this back at TensorFlow 1.4. As of today (TensorFlow 1.12 is stable, there’s a 1.13rc and 2.0 is around the corner) the docs linked in the question are much improved.


I’m trying to use tf.saved_model and also found the Docs quite (too) abstract. Here’s my stab at a full answer to your questions:

1. signature_def_map:

a. Format See Tom’s answer to Tensorflow: how to save/restore a model. (Ctrl-F for “tf.saved_model” – currently, the only uses of the phrase on that question are in his answer).

b. need It’s my understanding that you do normally need it. If you intend to use the model, you need to know the inputs and outputs of the graph. I think it is akin to a C++ function signature: If you intend to define a function after it’s called or in another C++ file, you need the signature in your main file (i.e. prototyped or in a header file).

2. assets_collection:

format: Couldn’t find clear documentation, so I went to the builder source code. It appears that the argument is an iterable of Tensors of dtype=tf.string, where each Tensor is a path for the asset directory. So, a TensorFlow Graph collection should work. I guess that is the parameter’s namesake, but from the source code I would expect a Python list to work too.

(You didn’t ask if you need to set it, but judging from Zoe’s answer to What are assets in tensorflow? and iga’s answer to the tangentially related Tensorflow serving: “No assets to save/writes” when exporting models, it doesn’t usually need set.)

3. Tags:

a. Why list I don’t know why you must pass a list, but you may pass a list with one element. For instance, in my current project I only use the [tf...tag_constants.SERVING] tag.

b. When to use multiple Say you’re using explicit device placement for operations. Maybe you want to save a CPU version and a GPU version of your graph. Obviously you want to save a serving version of each, and say you want to save training checkpoints. You could use a CPU/GPU tag and a training/serving tag to manage all cases. The docs hint at it:

Each MetaGraphDef added to the SavedModel must be annotated with user-specified tags. The tags provide a means to identify the specific MetaGraphDef to load and restore, along with the shared set of variables and assets. These tags typically annotate a MetaGraphDef with its functionality (for example, serving or training), and optionally with hardware-specific aspects (for example, GPU).

c. Collision
Too lazy to force a collision myself – I see two cases that would need addressed – I went to the loader source code. Inside def load, you’ll see:

saved_model = _parse_saved_model(export_dir)
found_match = False
for meta_graph_def in saved_model.meta_graphs:
  if set(meta_graph_def.meta_info_def.tags) == set(tags):
    meta_graph_def_to_load = meta_graph_def
    found_match = True
    break

if not found_match:
  raise RuntimeError(
      "MetaGraphDef associated with tags " + str(tags).strip("[]") +
      " could not be found in SavedModel. To inspect available tag-sets in"
      " the SavedModel, please use the SavedModel CLI: `saved_model_cli`"
  )

It appears to me that it’s looking for an exact match. E.g. say you have a metagraph with tags “GPU” and “Serving” and a metagraph with tag “Serving”. If you load “Serving”, you’ll get the latter metagraph. On the other hand, say you have a metagraph “GPU” and “Serving” and a metagraph “CPU” and “Serving”. If you try to load “Serving”, you’ll get the error. If you try to save two metagraphs with the exact same tags in the same folder, I expect you’ll overwrite the first one. It doesn’t look like the build code handles such a collision in any special way.

4. SavedModel or tf.train.Saver:

This confused me too. wicke’s answer to Should TensorFlow users prefer SavedModel over Checkpoint or GraphDef? cleared it up for me. I’ll throw in my two cents:

In the scope of local Python+TensorFlow, you can make tf.train.Saver do everything. But, it will cost you. Let me outline the save-a-trained-model-and-deploy use case. You’ll need your saver object. It’s easiest to set it up to save the complete graph (every variable). You probably don’t want to save the .meta all the time since you’re working with a static graph. You’ll need to specify that in your training hook. You can read about that on cv-tricks. When your training finishes, you’ll need convert your checkpoint file to a pb file. That usually means clearing the current graph, restoring the checkpoint, freezing your variables to constants with tf.python.framework.graph_util, and writing it with tf.gfile.GFile. You can read about that on medium. After that, you want to deploy it in Python. You’ll need the input and output Tensor names – the string names in the graph def. You can read about that on metaflow (actually a very good blog post for the tf.train.Saver method). Some op nodes will let you feed data into them easily. Some not so much. I usually gave up on finding an appropriate node and added a tf.reshape that didn’t actually reshape anything to the graph def. That was my ad-hoc input node. Same for the output. And then finally, you can deploy your model, at least locally in Python.

Or, you could use the answer I linked in point 1 to accomplish all this with the SavedModel API. Less headaches thanks to Tom’s answer . You’ll get more support and features in the future if it ever gets documented appropriately . Looks like it’s easier to use command line serving (the medium link covers doing that with Saver – looks tough, good luck!). It’s practically baked in to the new Estimators. And according to the Docs,

SavedModel is a language-neutral, recoverable, hermetic serialization format.

Emphasis mine: Looks like you can get your trained models into the growing C++ API much easier.

The way I see it, it’s like the Datasets API. It’s just easier than the old way!

As far as concrete examples of SavedModel of tf.train.Saver: If “basically, when you want to save or restore your model” isn’t clear enough for you: The correct time to use it is any time it makes your life easier. To me, that looks like always. Especially if you’re using Estimators, deploying in C++, or using command line serving.

So that’s my research on your question. Or four enumerated questions. Err, eight question marks. Hope this helps.

Answered By: Dylan F
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.