# How to Integrate TensorBoard with Custom Training Loops in TensorFlow Models

> Learn to integrate TensorBoard with custom training loops in TensorFlow. Easily visualize scalars histograms and images to monitor your models effectively.

- Repository: [tensorflow/models](https://github.com/tensorflow/models)
- Tags: how-to-guide
- Published: 2026-02-28

---

**To integrate TensorBoard with custom training loops, initialize a `tf.summary.FileWriter` pointing to a log directory, define summary operations (scalar, histogram, image) within your computation graph, merge them using `tf.summary.merge_all()`, and execute the merged operation each training step to serialize and write metrics via `add_summary()` followed by periodic `flush()` calls.**

When working with the [tensorflow/models](https://github.com/tensorflow/models) repository, you will encounter numerous research implementations that bypass high-level APIs like `tf.estimator` or `Keras` in favor of explicit session-based training loops. This guide demonstrates how to integrate TensorBoard logging into these custom training loops using patterns extracted directly from the repository's source code.

## The Four-Step TensorBoard Integration Pattern

Integrating TensorBoard into a custom training loop requires four distinct operations coordinated across graph construction and session execution. According to the source code in [`research/vid2depth/ops/icp_train_demo.py`](https://github.com/tensorflow/models/blob/main/research/vid2depth/ops/icp_train_demo.py) and [`research/rebar/rebar_train.py`](https://github.com/tensorflow/models/blob/main/research/rebar/rebar_train.py), the workflow follows this architecture:

1. **Writer Initialization**: Create a `tf.summary.FileWriter` targeting a specific log directory where TensorBoard monitors event files.
2. **Summary Definition**: Insert `tf.summary.scalar()`, `tf.summary.histogram()`, or `tf.summary.image()` operations into the graph to capture target tensors.
3. **Op Merging**: Consolidate all summary operations into a single execution node using `tf.summary.merge_all()` or `tf.summary.merge()`.
4. **Serialized Writing**: During the training loop, run the merged summary operation, then feed the resulting protobuf string to the writer via `add_summary()`, followed by `flush()` to ensure disk persistence.

This pattern enables real-time visualization of training metrics without sacrificing the flexibility of low-level TensorFlow control.

## Implementing File Writers and Summary Operations

In the TensorFlow 1.x codebase prevalent throughout the models repository, summary operations must be explicitly defined during graph construction and evaluated within a `tf.Session`.

### Creating the FileWriter

Instantiate the writer immediately after creating your session, passing the log directory and optionally the graph definition to visualize the model topology. The [`research/rebar/rebar_train.py`](https://github.com/tensorflow/models/blob/main/research/rebar/rebar_train.py) file demonstrates advanced configuration:

```python
import tensorflow as tf
import os

# Directory configuration

summ_dir = os.path.join(FLAGS.working_dir, hparams_str)

# Writer with custom flush behavior

summary_writer = tf.summary.FileWriter(
    summ_dir, 
    flush_secs=15,       # Force write every 15 seconds

    max_queue=100        # Buffer up to 100 summaries

)

```

The `flush_secs` parameter controls how frequently the writer synchronizes pending events to disk, while `max_queue` limits memory consumption by bounding the internal buffering queue.

### Defining and Merging Summaries

During model construction, attach summary operations to tensors you wish to monitor. In [`research/vid2depth/ops/icp_train_demo.py`](https://github.com/tensorflow/models/blob/main/research/vid2depth/ops/icp_train_demo.py), scalar summaries track optimization variables:

```python
def inference(source, target):
    ego_motion = tf.Variable(tf.zeros([6]), name='ego_motion')
    tf.summary.scalar('tx', ego_motion[0])
    tf.summary.scalar('ty', ego_motion[1])
    # Additional histograms or images as needed

    return outputs

def training(loss, lr):
    tf.summary.scalar('loss', loss)
    # ... optimizer setup ...

```

Once all summaries are defined, consolidate them into a single execution op:

```python
summary_op = tf.summary.merge_all()

```

This returns a tensor that, when evaluated, produces a serialized `Summary` protocol buffer containing all defined metrics for that specific step.

## Executing the Training Loop

The critical integration occurs inside the training iteration, where you must execute the training operation, evaluate the summary operation with identical feed data, and persist the results.

### Minimal Custom Loop Implementation

The [`research/vid2depth/ops/icp_train_demo.py`](https://github.com/tensorflow/models/blob/main/research/vid2depth/ops/icp_train_demo.py) file provides a complete implementation pattern:

```python
def run_training():
    with tf.Graph().as_default():
        # Graph construction

        src_pl, tgt_pl = placeholder_inputs(FLAGS.batch_size)
        pred, gt = inference(src_pl, tgt_pl)
        loss = loss_func(pred, gt)
        train_op = training(loss, FLAGS.learning_rate)
        
        summary_op = tf.summary.merge_all()
        init = tf.global_variables_initializer()

        with tf.Session() as sess:
            # Writer initialization with graph visualization

            summary_writer = tf.summary.FileWriter(
                FLAGS.train_dir, sess.graph)
            
            sess.run(init)

            for step in range(FLAGS.max_steps):
                feed = {src_pl: batch_data, tgt_pl: target_data}
                
                # Execute training

                _, loss_val = sess.run([train_op, loss], feed_dict=feed)
                
                # Evaluate and write summaries

                summary_str = sess.run(summary_op, feed_dict=feed)
                summary_writer.add_summary(summary_str, step)
                
                # Explicit flush every 100 steps

                if step % 100 == 0:
                    summary_writer.flush()

```

Note that `feed_dict` must be supplied to both the training operation and the summary operation to ensure metric calculations use the same input data as the optimization step.

### Advanced Multi-Summary Patterns

For scenarios requiring different summary frequencies or conditional logging, [`research/rebar/rebar_train.py`](https://github.com/tensorflow/models/blob/main/research/rebar/rebar_train.py) demonstrates explicit summary construction without `merge_all()`:

```python
summary_strings = []
summary_strings.append(tf.summary.scalar('Train ELBO', train_elbo))
summary_strings.append(tf.summary.scalar('Temperature', temperature))

for summ_str in summary_strings:
    summary_writer.add_summary(summ_str, global_step=step)

summary_writer.flush()

```

This approach allows fine-grained control over which metrics are recorded at specific training phases, bypassing the global merge operation.

## TensorFlow 2.x Compatibility

While the models repository predominantly uses TensorFlow 1.x patterns, modern implementations require eager-execution compatible APIs. Replace the session-based workflow with `tf.summary.create_file_writer()`:

```python
writer = tf.summary.create_file_writer(logdir)

for step, batch in enumerate(dataset):
    # ... training logic ...

    
    with writer.as_default():
        tf.summary.scalar('loss', loss, step=global_step)
        tf.summary.histogram('weights', model.weights, step=global_step)
    
    if step % 100 == 0:
        writer.flush()

```

The underlying mechanism remains identical: a file writer emits serialized protocol buffers to a log directory, which TensorBoard monitors for visualization updates.

## Summary

Integrating TensorBoard with custom training loops in the tensorflow/models repository requires explicit management of file writers and summary operations:

- **Initialize** `tf.summary.FileWriter` with your target log directory and optional `flush_secs`/`max_queue` parameters for I/O tuning.
- **Define** summary ops during graph construction using `tf.summary.scalar()`, `histogram()`, or `image()` to capture relevant metrics.
- **Merge** operations using `tf.summary.merge_all()` to create a single execution node, or handle summaries individually for conditional logging.
- **Execute** the summary operation within your training loop using the same `feed_dict` as your training op, then write results via `add_summary()` and `flush()`.

## Frequently Asked Questions

### How do I ensure TensorBoard displays the graph structure in addition to metrics?

Pass the session's graph object to the FileWriter constructor: `tf.summary.FileWriter(logdir, sess.graph)`. This serializes the graph definition to the event file, enabling the Graphs dashboard in TensorBoard. The [`research/vid2depth/ops/icp_train_demo.py`](https://github.com/tensorflow/models/blob/main/research/vid2depth/ops/icp_train_demo.py) implementation demonstrates this pattern immediately after session creation.

### What is the performance impact of running summary operations every training step?

Summary operations require additional computation and disk I/O. For compute-intensive models, evaluate the merged summary op every N steps rather than every iteration, or use the `max_queue` parameter to buffer summaries in memory and reduce flush frequency. The [`research/rebar/rebar_train.py`](https://github.com/tensorflow/models/blob/main/research/rebar/rebar_train.py) example configures `flush_secs=15` to balance latency against I/O overhead.

### Can I write to multiple log directories from a single training script?

Yes. Instantiate separate `FileWriter` objects pointing to different directories, such as `train/` and `eval/`. The [`research/object_detection/eval_util.py`](https://github.com/tensorflow/models/blob/main/research/object_detection/eval_util.py) file utilizes `tf.summary.FileWriterCache` to manage shared writers across different evaluation metrics, ensuring thread-safe access to distinct event files for separate visualization tabs.

### Why are my summaries not appearing immediately in TensorBoard?

The FileWriter buffers events in memory for performance. Call `writer.flush()` explicitly after `add_summary()` to force immediate disk writes, or verify that your `flush_secs` parameter is not set to an excessively high value. Additionally, ensure TensorBoard is pointed to the parent directory containing your event files, not a specific subdirectory containing checkpoints.