How to Debug Lock Contention with LiteBox Lock Tracing

LiteBox provides a built-in lock-tracing subsystem that records every lock attempt, acquisition, and release to JSONL files, enabling precise identification of contention hotspots through timestamp analysis.

Debugging lock contention in concurrent Rust applications requires deep visibility into synchronization primitive behavior. The microsoft/litebox repository ships with a sophisticated lock-tracing feature that instruments mutexes and read-write locks to capture detailed timing data with minimal overhead. This guide explains how to enable the subsystem, record contention events, and analyze the resulting traces to optimize your application's parallel performance.

Enabling Lock Tracing in LiteBox

Activating the Cargo Feature

To use the tracing infrastructure, enable the lock_tracing feature in your Cargo.toml. This feature gates the entire subsystem to ensure zero runtime overhead when disabled.

[dependencies]
litebox = { git = "https://github.com/microsoft/litebox", features = ["lock_tracing"] }

Initializing the Global LockTracker

The tracing system centers on a singleton LockTracker that maintains per-thread stacks of held locks. According to the source code in litebox/src/litebox.rs, the tracker initializes automatically during LiteBox::new:

// litebox/src/litebox.rs#L62-L66
pub fn new(platform: &dyn Platform) -> Self {
    #[cfg(feature = "lock_tracing")]
    lock_tracing::init();
    // ... remainder of initialization
}

Recording Lock Events

Starting and Stopping Recording Sessions

The public API exposed in litebox/src/sync/mod.rs provides three core functions to control data capture:

use litebox::sync::{start_recording, stop_recording, flush_to_jsonl};

fn analyze_critical_section() {
    start_recording();
    
    // Your concurrent code here...
    
    stop_recording();
    
    // Write events to stdout or file
    for line in flush_to_jsonl() {
        println!("{}", line);
    }
}

Automatic Recording in Linux Userland

The litebox_runner_linux_userland crate demonstrates a full-process recording strategy. As shown in litebox_runner_linux_userland/src/lib.rs, the runner wraps program execution with automatic trace collection:

// litebox_runner_linux_userland/src/lib.rs#L34-L48
pub fn run_with_tracing<F>(f: F) 
where 
    F: FnOnce() 
{
    start_recording();
    f();
    stop_recording();
    
    // Automatically writes to /tmp/locks.jsonl
    std::fs::write("/tmp/locks.jsonl", flush_to_jsonl().join("\n")).unwrap();
}

Detecting Lock Contention

Console Diagnostics for Contended Locks

The tracing subsystem provides compile-time configuration constants in litebox/src/sync/lock_tracing.rs to control console output. When CONFIG_PRINT_CONTENDED_LOCKS is enabled, the system emits "Attempt ... CONTENDED" messages before blocking acquisitions.

The debug_log_println! macro handles these emissions around line 92 of the same file. Additionally, setting CONFIG_PRINT_LOCKS_SLOWER_THAN (default 10ms) triggers "LONG WAIT ..." messages for acquisitions exceeding the threshold.

Identifying Slow Lock Acquisitions in JSONL Traces

For quantitative analysis, examine the JSON Lines output produced by flush_to_jsonl(). Each event contains nanosecond-precision timestamps:

{"event_type":"attempt","timestamp_ns":123456789,"lock_addr":"0x7fffd1234abc","lock_type":"Mutex","file":"src/main.rs","line":42}
{"event_type":"acquired","timestamp_ns":123476789,"lock_addr":"0x7fffd1234abc","lock_type":"Mutex","file":"src/main.rs","line":42}

The delta between attempt and acquired timestamps reveals contention duration. In this example, the 20,000 nanosecond (20µs) gap indicates the thread waited for another holder to release the lock.

Analyzing Lock Traces

Understanding JSONL Event Format

The EVENT_RECORDER buffer stores structured events defined in litebox/src/sync/lock_tracing.rs. When flushed, each RecordedEvent serializes to JSON with these fields:

  • event_type: Operation classification (attempt, acquired, released, created, destroyed)
  • timestamp_ns: Monotonic nanoseconds since tracker initialization
  • lock_addr: Memory address identifying the specific lock instance
  • lock_type: Synchronization primitive type (Mutex, RwLockRead, RwLockWrite)
  • file / line: Source location captured via file!() and line!() macros

Calculating Contention from Timestamps

To programmatically identify hotspots, aggregate events by lock_addr and compute wait times:

use std::collections::HashMap;

fn analyze_contention(jsonl_lines: &[String]) {
    let mut attempts: HashMap<String, u64> = HashMap::new();
    
    for line in jsonl_lines {
        let event: serde_json::Value = serde_json::from_str(line).unwrap();
        let addr = event["lock_addr"].as_str().unwrap().to_string();
        let ts = event["timestamp_ns"].as_u64().unwrap();
        
        match event["event_type"].as_str().unwrap() {
            "attempt" => { attempts.insert(addr, ts); }
            "acquired" => {
                if let Some(start) = attempts.remove(&addr) {
                    let wait_ns = ts - start;
                    if wait_ns > 10_000_000 { // 10ms threshold
                        println!("High contention on {}: {}ms wait", addr, wait_ns / 1_000_000);
                    }
                }
            }
            _ => {}
        }
    }
}

Configuring Trace Verbosity

Compile-Time Configuration Constants

The tracing behavior is controlled by boolean constants defined in litebox/src/sync/lock_tracing.rs (lines 47-69). These are evaluated at compile time to ensure zero-cost when disabled:

  • CONFIG_PRINT_LOCK_ATTEMPTS: Emit console messages for every lock attempt
  • CONFIG_PRINT_CONTENDED_LOCKS: Print "CONTENDED" warnings before blocking acquisitions
  • CONFIG_PRINT_LOCKS_SLOWER_THAN: Threshold in milliseconds for "LONG WAIT" messages (default 10ms)
  • CONFIG_ENABLE_RECORDING: Buffer events in EVENT_RECORDER for JSONL export
  • CONFIG_PANIC_ON_NON_BRACKETED_UNLOCK: Enable strict lock/unlock pairing validation

To customize, copy lock_tracing.rs into your crate as a module, modify the constants, and ensure your version takes precedence in the module hierarchy.

Minimal Example: Recording Contention in Rust

The following complete example demonstrates instrumenting a multi-threaded workload to capture contention data:

// Cargo.toml
[package]
name = "contention_demo"
version = "0.1.0"
edition = "2021"

[dependencies]
litebox = { git = "https://github.com/microsoft/litebox", features = ["lock_tracing"] }
serde_json = "1.0"

// src/main.rs
use litebox::{LiteBox, sync::{start_recording, stop_recording, flush_to_jsonl}};
use std::sync::Arc;
use std::thread;
use std::time::Duration;

fn main() {
    // Initialize platform and LiteBox (creates LockTracker)
    let platform = litebox_platform_linux_userland::LinuxPlatform::new();
    let _lb = LiteBox::new(&platform);
    
    // Start recording lock events
    start_recording();
    
    // Create a LiteBox Mutex (instrumented when lock_tracing is enabled)
    let mutex = Arc::new(litebox::sync::Mutex::new(0u64));
    let mut handles = vec![];
    
    // Spawn threads that contend for the lock
    for i in 0..4 {
        let mu = Arc::clone(&mutex);
        handles.push(thread::spawn(move || {
            for _ in 0..100 {
                let mut guard = mu.lock();
                *guard += 1;
                // Hold lock briefly to force contention
                thread::sleep(Duration::from_micros(10));
            }
        }));
    }
    
    for h in handles {
        h.join().unwrap();
    }
    
    // Stop recording and output JSONL
    stop_recording();
    println!("\nLock trace output:");
    for line in flush_to_jsonl() {
        println!("{}", line);
    }
}

When executed with the lock_tracing feature enabled, this program outputs JSON Lines showing every attempt, acquired, and released event. Analyzing the timestamp deltas between attempt and acquired events reveals which threads experienced contention and for precisely how long.

Summary

  • LiteBox provides a compile-time optional lock-tracing subsystem activated via the lock_tracing Cargo feature.
  • The LockTracker singleton initializes during LiteBox::new and maintains per-thread lock stacks throughout the process lifetime.
  • Use start_recording and stop_recording to delimit capture windows, then flush_to_jsonl to export structured event data.
  • Contention manifests as gaps between attempt and acquired timestamps in the JSONL output, or as "CONTENDED" console messages when CONFIG_PRINT_CONTENDED_LOCKS is enabled.
  • Configure verbosity via compile-time constants in litebox/src/sync/lock_tracing.rs to tailor diagnostic detail against runtime overhead.

Frequently Asked Questions

What is the performance overhead of LiteBox lock tracing?

When the lock_tracing feature is disabled, the subsystem imposes zero runtime cost due to compile-time conditional compilation. When enabled, overhead depends on configuration: buffering events to EVENT_RECORDER adds memory allocation and atomic operations, while console printing via CONFIG_PRINT_LOCK_ATTEMPTS incurs synchronous I/O latency. For production diagnostics, use CONFIG_ENABLE_RECORDING with periodic flush_to_jsonl calls rather than continuous console output to minimize blocking.

How do I enable lock tracing without modifying LiteBox source code?

Enable the feature in your dependent crate's Cargo.toml by specifying features = ["lock_tracing"] for the litebox dependency. To customize configuration constants without forking the repository, copy litebox/src/sync/lock_tracing.rs into your project as a local module, modify the CONFIG_* boolean constants at the top of the file, and ensure your build configuration prefers your local version over the crate's internal module.

Can I use lock tracing in production environments?

Yes, provided you configure it for reliability. Set CONFIG_PANIC_ON_NON_BRACKETED_UNLOCK to false to prevent instrumentation errors from crashing your application. Use CONFIG_ENABLE_RECORDING to buffer events in memory, then periodically call flush_to_jsonl to write to disk asynchronously. Avoid CONFIG_PRINT_LOCK_ATTEMPTS in high-throughput scenarios due to console I/O blocking. The Linux userland runner demonstrates this pattern by writing to /tmp/locks.jsonl only after stop_recording.

Where does LiteBox write the lock trace files?

By default, the tracing subsystem returns trace data via flush_to_jsonl(), which yields a Vec<String> of JSON Lines rather than writing directly to disk. The litebox_runner_linux_userland crate demonstrates a typical pattern by explicitly writing to /tmp/locks.jsonl after stopping the recording, as shown in litebox_runner_linux_userland/src/lib.rs. You can direct output to any path by collecting the strings from flush_to_jsonl() and using std::fs::write or your preferred logging infrastructure.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →