How do you read Tensorboard files programmatically?

Question:

How can you write a python script to read Tensorboard log files, extracting the loss and accuracy and other numerical data, without launching the GUI tensorboard --logdir=...?

Asked By: mikal94305

||

Answers:

You can use TensorBoard’s Python classes or script to extract the data:

How can I export data from TensorBoard?

If you’d like to export data to visualize elsewhere (e.g. iPython Notebook), that’s possible too. You can directly depend on the underlying classes that TensorBoard uses for loading data: python/summary/event_accumulator.py (for loading data from a single run) or python/summary/event_multiplexer.py (for loading data from multiple runs, and keeping it organized). These classes load groups of event files, discard data that was "orphaned" by TensorFlow crashes, and organize the data by tag.

As another option, there is a script (tensorboard/scripts/serialize_tensorboard.py) which will load a logdir just like TensorBoard does, but write all of the data out to disk as json instead of starting a server. This script is setup to make "fake TensorBoard backends" for testing, so it is a bit rough around the edges.

Using EventAccumulator:

# In [1]: from tensorflow.python.summary import event_accumulator  # deprecated
In [1]: from tensorboard.backend.event_processing import event_accumulator

In [2]: ea = event_accumulator.EventAccumulator('events.out.tfevents.x.ip-x-x-x-x',
   ...:  size_guidance={ # see below regarding this argument
   ...:      event_accumulator.COMPRESSED_HISTOGRAMS: 500,
   ...:      event_accumulator.IMAGES: 4,
   ...:      event_accumulator.AUDIO: 4,
   ...:      event_accumulator.SCALARS: 0,
   ...:      event_accumulator.HISTOGRAMS: 1,
   ...:  })

In [3]: ea.Reload() # loads events from file
Out[3]: <tensorflow.python.summary.event_accumulator.EventAccumulator at 0x7fdbe5ff59e8>

In [4]: ea.Tags()
Out[4]: 
{'audio': [],
 'compressedHistograms': [],
 'graph': True,
 'histograms': [],
 'images': [],
 'run_metadata': [],
 'scalars': ['Loss', 'Epsilon', 'Learning_rate']}

In [5]: ea.Scalars('Loss')
Out[5]: 
[ScalarEvent(wall_time=1481232633.080754, step=1, value=1.6365480422973633),
 ScalarEvent(wall_time=1481232633.2001867, step=2, value=1.2162202596664429),
 ScalarEvent(wall_time=1481232633.3877788, step=3, value=1.4660096168518066),
 ScalarEvent(wall_time=1481232633.5749283, step=4, value=1.2405034303665161),
 ScalarEvent(wall_time=1481232633.7419815, step=5, value=0.897326648235321),
 ...]

size_guidance:

size_guidance: Information on how much data the EventAccumulator should
  store in memory. The DEFAULT_SIZE_GUIDANCE tries not to store too much
  so as to avoid OOMing the client. The size_guidance should be a map
  from a `tagType` string to an integer representing the number of
  items to keep per tag for items of that `tagType`. If the size is 0,
  all events are stored.
Answered By: user1501961

To finish user1501961’s answer, you can then just export the list of scalars to a csv file easily with pandas pd.DataFrame(ea.Scalars('Loss)).to_csv('Loss.csv')

Answered By: mikal94305

For anyone interested, I’ve adapted user1501961‘s answer into a function for parsing tensorboard scalars into a dictionary of pandas dataframes:

from tensorboard.backend.event_processing import event_accumulator
import pandas as pd


def parse_tensorboard(path, scalars):
    """returns a dictionary of pandas dataframes for each requested scalar"""
    ea = event_accumulator.EventAccumulator(
        path,
        size_guidance={event_accumulator.SCALARS: 0},
    )
    _absorb_print = ea.Reload()
    # make sure the scalars are in the event accumulator tags
    assert all(
        s in ea.Tags()["scalars"] for s in scalars
    ), "some scalars were not found in the event accumulator"
    return {k: pd.DataFrame(ea.Scalars(k)) for k in scalars}


Answered By: thesofakillers

Try this:

    python << EOF | bat --color=always -pltsv
from contextlib import suppress
import sys
from time import gmtime, strftime

import pandas as pd
from tensorboard.backend.event_processing.event_file_loader import (
    EventFileLoader,
)

df = pd.DataFrame(columns=["Step", "Value"])
df.index.name = "YYYY-mm-dd HH:MM"

for event in EventFileLoader("$1").Load():
    with suppress(IndexError):
        df.loc[strftime("%F %H:%M", gmtime(event.wall_time))] = [  # type: ignore
            event.step,  # type: ignore
            event.summary.value[0].tensor.float_val[0],  # type: ignore
        ]
df.index = pd.to_datetime(df.index)  # type: ignore
df.Step = df.Step.astype(int)
df.to_csv(sys.stdout, "t")
EOF

bat is optimal.

Answered By: wzy