TensorFlow – Importing data from a TensorBoard TFEvent file?
Question:
I’ve run several training sessions with different graphs in TensorFlow. The summaries I set up show interesting results in the training and validation. Now, I’d like to take the data I’ve saved in the summary logs and perform some statistical analysis and in general plot and look at the summary data in different ways. Is there any existing way to easily access this data?
More specifically, is there any built in way to read a TFEvent record back into Python?
If there is no simple way to do this, TensorFlow states that all its file formats are protobuf files. From my understanding of protobufs (which is limited), I think I’d be able to extract this data if I have the TFEvent protocol specification. Is there an easy way to get ahold of this? Thank you much.
Answers:
You can simply use:
tensorboard --inspect --event_file=myevents.out
or if you want to filter a specific subset of events of the graph:
tensorboard --inspect --event_file=myevents.out --tag=loss
If you want to create something more custom you can dig into the
/tensorflow/python/summary/event_file_inspector.py
to understand how to parse the event files.
As Fabrizio says, TensorBoard is a great tool for visualizing the contents of your summary logs. However, if you want to perform a custom analysis, you can use tf.train.summary_iterator()
function to loop over all of the tf.Event
and tf.Summary
protocol buffers in the log:
for summary in tf.train.summary_iterator("/path/to/log/file"):
# Perform custom processing in here.
UPDATE for tf2:
from tensorflow.python.summary.summary_iterator import summary_iterator
You need to import it, that module level is not currently imported by default. On 2.0.0-rc2
You can use the script serialize_tensorboard, which will take in a logdir and write out all the data in json format.
You can also use an EventAccumulator for a convenient Python API (this is the same API that TensorBoard uses).
To read a TFEvent you can get a Python iterator that yields Event protocol buffers.
# This example supposes that the events file contains summaries with a
# summary value tag 'loss'. These could have been added by calling
# `add_summary()`, passing the output of a scalar summary op created with
# with: `tf.scalar_summary(['loss'], loss_tensor)`.
for e in tf.train.summary_iterator(path_to_events_file):
for v in e.summary.value:
if v.tag == 'loss' or v.tag == 'accuracy':
print(v.simple_value)
more info: summary_iterator
Here is a complete example for obtaining values from a scalar. You can see the message specification for the Event protobuf message here
import tensorflow as tf
for event in tf.train.summary_iterator('runs/easy_name/events.out.tfevents.1521590363.DESKTOP-43A62TM'):
for value in event.summary.value:
print(value.tag)
if value.HasField('simple_value'):
print(value.simple_value)
I’ve been using this. It assumes that you only want to see tags you’ve logged more than once whose values are floats and returns the results as a pd.DataFrame
. Just call metrics_df = parse_events_file(path)
.
from collections import defaultdict
import pandas as pd
import tensorflow as tf
def is_interesting_tag(tag):
if 'val' in tag or 'train' in tag:
return True
else:
return False
def parse_events_file(path: str) -> pd.DataFrame:
metrics = defaultdict(list)
for e in tf.train.summary_iterator(path):
for v in e.summary.value:
if isinstance(v.simple_value, float) and is_interesting_tag(v.tag):
metrics[v.tag].append(v.simple_value)
if v.tag == 'loss' or v.tag == 'accuracy':
print(v.simple_value)
metrics_df = pd.DataFrame({k: v for k,v in metrics.items() if len(v) > 1})
return metrics_df
Following works as of tensorflow version 2.0.0-beta1
:
import os
import tensorflow as tf
from tensorflow.python.framework import tensor_util
summary_dir = 'tmp/summaries'
summary_writer = tf.summary.create_file_writer('tmp/summaries')
with summary_writer.as_default():
tf.summary.scalar('loss', 0.1, step=42)
tf.summary.scalar('loss', 0.2, step=43)
tf.summary.scalar('loss', 0.3, step=44)
tf.summary.scalar('loss', 0.4, step=45)
from tensorflow.core.util import event_pb2
from tensorflow.python.lib.io import tf_record
def my_summary_iterator(path):
for r in tf_record.tf_record_iterator(path):
yield event_pb2.Event.FromString(r)
for filename in os.listdir(summary_dir):
path = os.path.join(summary_dir, filename)
for event in my_summary_iterator(path):
for value in event.summary.value:
t = tensor_util.MakeNdarray(value.tensor)
print(value.tag, event.step, t, type(t))
the code for my_summary_iterator
is copied from tensorflow.python.summary.summary_iterator.py
– there was no way to import it at runtime.
Late 2020 versions of TensorFlow and TensorFlow Datasets recommends a different approach. Use tf.data.TFRecordDataset
and event_pb2
:
from os import path, listdir
from operator import contains
from functools import partial
from itertools import chain
from json import loads
import numpy as np
import tensorflow as tf
from tensorflow.core.util import event_pb2
# From https://github.com/Suor/funcy/blob/0ee7ae8/funcy/funcs.py#L34-L36
def rpartial(func, *args):
"""Partially applies last arguments."""
return lambda *a: func(*(a + args))
tensorboard_logdir = "/tmp"
# Or you could just glob… for *tfevents*:
list_dir = lambda p: map(partial(path.join, p), listdir(p))
for event in filter(rpartial(contains, "tfevents"),
chain.from_iterable(
map(list_dir,
chain.from_iterable(
map(list_dir,
filter(rpartial(contains, "_epochs_"),
list_dir(tensorboard_logdir))))))):
print(event)
for raw_record in tf.data.TFRecordDataset(event):
for value in event_pb2.Event.FromString(raw_record.numpy()).summary.value:
print("value: {!r} ;".format(value))
if value.tensor.ByteSize():
t = tf.make_ndarray(value.tensor)
if hasattr(event, "step"):
print(value.tag, event.step, t, type(t))
elif type(t).__module__ == np.__name__:
print("t: {!r} ;".format(np.vectorize(loads)(t)))
print()
There are 2 native ways to read the event files mentioned in this post:
-
Event Accumulator
>>> from tensorboard.backend.event_processing.event_accumulator import EventAccumulator
>>> event_acc = EventAccumulator(event_file)
>>> event_acc.Reload()
<tensorboard.backend.event_processing.event_accumulator.EventAccumulator object at ...>
>>> print(event_acc.Tags())
{'images': [], 'audio': [], 'histograms': [], 'scalars': ['y=2x'], 'distributions': [], 'tensors': [], 'graph': False, 'meta_graph': False, 'run_metadata': []}
>>> for e in event_acc.Scalars('y=2x'):
... print(e.step, e.value)
0 0.0
1 2.0
2 4.0
3 6.0
4 8.0
-
Summary Iterator
>>> import tensorflow as tf
>>> from tensorflow.python.summary.summary_iterator import summary_iterator
>>> for e in summary_iterator(event_file):
... for v in e.summary.value:
... if v.tag == 'y=2x':
... print(e.step, v.simple_value)
0 0.0
1 2.0
2 4.0
3 6.0
4 8.0
For multiple event files or other event types (e.g., histograms), you can use tbparse to parse the event logs into a pandas DataFrame and process it locally. You can open an issue if you encountered any question during parsing. (I’m the author of tbparse)
Note: TensorBoard can parse the event logs into DataFrames only if you uploaded your event logs to TensorBoard.dev (source), and is currently not possible to use it offline/locally.
I’ve run several training sessions with different graphs in TensorFlow. The summaries I set up show interesting results in the training and validation. Now, I’d like to take the data I’ve saved in the summary logs and perform some statistical analysis and in general plot and look at the summary data in different ways. Is there any existing way to easily access this data?
More specifically, is there any built in way to read a TFEvent record back into Python?
If there is no simple way to do this, TensorFlow states that all its file formats are protobuf files. From my understanding of protobufs (which is limited), I think I’d be able to extract this data if I have the TFEvent protocol specification. Is there an easy way to get ahold of this? Thank you much.
You can simply use:
tensorboard --inspect --event_file=myevents.out
or if you want to filter a specific subset of events of the graph:
tensorboard --inspect --event_file=myevents.out --tag=loss
If you want to create something more custom you can dig into the
/tensorflow/python/summary/event_file_inspector.py
to understand how to parse the event files.
As Fabrizio says, TensorBoard is a great tool for visualizing the contents of your summary logs. However, if you want to perform a custom analysis, you can use tf.train.summary_iterator()
function to loop over all of the tf.Event
and tf.Summary
protocol buffers in the log:
for summary in tf.train.summary_iterator("/path/to/log/file"):
# Perform custom processing in here.
UPDATE for tf2:
from tensorflow.python.summary.summary_iterator import summary_iterator
You need to import it, that module level is not currently imported by default. On 2.0.0-rc2
You can use the script serialize_tensorboard, which will take in a logdir and write out all the data in json format.
You can also use an EventAccumulator for a convenient Python API (this is the same API that TensorBoard uses).
To read a TFEvent you can get a Python iterator that yields Event protocol buffers.
# This example supposes that the events file contains summaries with a
# summary value tag 'loss'. These could have been added by calling
# `add_summary()`, passing the output of a scalar summary op created with
# with: `tf.scalar_summary(['loss'], loss_tensor)`.
for e in tf.train.summary_iterator(path_to_events_file):
for v in e.summary.value:
if v.tag == 'loss' or v.tag == 'accuracy':
print(v.simple_value)
more info: summary_iterator
Here is a complete example for obtaining values from a scalar. You can see the message specification for the Event protobuf message here
import tensorflow as tf
for event in tf.train.summary_iterator('runs/easy_name/events.out.tfevents.1521590363.DESKTOP-43A62TM'):
for value in event.summary.value:
print(value.tag)
if value.HasField('simple_value'):
print(value.simple_value)
I’ve been using this. It assumes that you only want to see tags you’ve logged more than once whose values are floats and returns the results as a pd.DataFrame
. Just call metrics_df = parse_events_file(path)
.
from collections import defaultdict
import pandas as pd
import tensorflow as tf
def is_interesting_tag(tag):
if 'val' in tag or 'train' in tag:
return True
else:
return False
def parse_events_file(path: str) -> pd.DataFrame:
metrics = defaultdict(list)
for e in tf.train.summary_iterator(path):
for v in e.summary.value:
if isinstance(v.simple_value, float) and is_interesting_tag(v.tag):
metrics[v.tag].append(v.simple_value)
if v.tag == 'loss' or v.tag == 'accuracy':
print(v.simple_value)
metrics_df = pd.DataFrame({k: v for k,v in metrics.items() if len(v) > 1})
return metrics_df
Following works as of tensorflow version 2.0.0-beta1
:
import os
import tensorflow as tf
from tensorflow.python.framework import tensor_util
summary_dir = 'tmp/summaries'
summary_writer = tf.summary.create_file_writer('tmp/summaries')
with summary_writer.as_default():
tf.summary.scalar('loss', 0.1, step=42)
tf.summary.scalar('loss', 0.2, step=43)
tf.summary.scalar('loss', 0.3, step=44)
tf.summary.scalar('loss', 0.4, step=45)
from tensorflow.core.util import event_pb2
from tensorflow.python.lib.io import tf_record
def my_summary_iterator(path):
for r in tf_record.tf_record_iterator(path):
yield event_pb2.Event.FromString(r)
for filename in os.listdir(summary_dir):
path = os.path.join(summary_dir, filename)
for event in my_summary_iterator(path):
for value in event.summary.value:
t = tensor_util.MakeNdarray(value.tensor)
print(value.tag, event.step, t, type(t))
the code for my_summary_iterator
is copied from tensorflow.python.summary.summary_iterator.py
– there was no way to import it at runtime.
Late 2020 versions of TensorFlow and TensorFlow Datasets recommends a different approach. Use tf.data.TFRecordDataset
and event_pb2
:
from os import path, listdir
from operator import contains
from functools import partial
from itertools import chain
from json import loads
import numpy as np
import tensorflow as tf
from tensorflow.core.util import event_pb2
# From https://github.com/Suor/funcy/blob/0ee7ae8/funcy/funcs.py#L34-L36
def rpartial(func, *args):
"""Partially applies last arguments."""
return lambda *a: func(*(a + args))
tensorboard_logdir = "/tmp"
# Or you could just glob… for *tfevents*:
list_dir = lambda p: map(partial(path.join, p), listdir(p))
for event in filter(rpartial(contains, "tfevents"),
chain.from_iterable(
map(list_dir,
chain.from_iterable(
map(list_dir,
filter(rpartial(contains, "_epochs_"),
list_dir(tensorboard_logdir))))))):
print(event)
for raw_record in tf.data.TFRecordDataset(event):
for value in event_pb2.Event.FromString(raw_record.numpy()).summary.value:
print("value: {!r} ;".format(value))
if value.tensor.ByteSize():
t = tf.make_ndarray(value.tensor)
if hasattr(event, "step"):
print(value.tag, event.step, t, type(t))
elif type(t).__module__ == np.__name__:
print("t: {!r} ;".format(np.vectorize(loads)(t)))
print()
There are 2 native ways to read the event files mentioned in this post:
-
Event Accumulator
>>> from tensorboard.backend.event_processing.event_accumulator import EventAccumulator >>> event_acc = EventAccumulator(event_file) >>> event_acc.Reload() <tensorboard.backend.event_processing.event_accumulator.EventAccumulator object at ...> >>> print(event_acc.Tags()) {'images': [], 'audio': [], 'histograms': [], 'scalars': ['y=2x'], 'distributions': [], 'tensors': [], 'graph': False, 'meta_graph': False, 'run_metadata': []} >>> for e in event_acc.Scalars('y=2x'): ... print(e.step, e.value) 0 0.0 1 2.0 2 4.0 3 6.0 4 8.0
-
Summary Iterator
>>> import tensorflow as tf >>> from tensorflow.python.summary.summary_iterator import summary_iterator >>> for e in summary_iterator(event_file): ... for v in e.summary.value: ... if v.tag == 'y=2x': ... print(e.step, v.simple_value) 0 0.0 1 2.0 2 4.0 3 6.0 4 8.0
For multiple event files or other event types (e.g., histograms), you can use tbparse to parse the event logs into a pandas DataFrame and process it locally. You can open an issue if you encountered any question during parsing. (I’m the author of tbparse)
Note: TensorBoard can parse the event logs into DataFrames only if you uploaded your event logs to TensorBoard.dev (source), and is currently not possible to use it offline/locally.