Debugging System Issues with strace and ltrace: Essential Linux Techniques

Use strace to monitor kernel-level system calls and ltrace to inspect library interactions, leveraging flags like -f to follow child processes and -c to profile call performance when diagnosing why a Linux program fails, hangs, or crashes.

The jlevy/the-art-of-command-line repository documents essential techniques for debugging system issues with strace and ltrace, two lightweight utilities that reveal exactly where a process breaks by intercepting its runtime behavior without requiring source code or recompilation. According to the repository's README.md at line 336, these tools provide immediate visibility into system call failures and library function errors that cause mysterious crashes or performance degradation.

Understanding strace vs. ltrace for System Debugging

strace intercepts system calls—the interface between a program and the Linux kernel. Use it to inspect file I/O, network sockets, process control, and permission checks.

ltrace intercepts calls to dynamically linked libraries. Use it to trace high-level functions like printf, malloc, or external library APIs without seeing kernel-level noise.

When deciding which tool to use for debugging system issues with strace and ltrace, consider the failure type:

  • Kernel-level problems (missing files, permission denied, network timeouts) → Use strace
  • Library-level problems (memory leaks, library initialization failures) → Use ltrace
  • Performance profiling → Use either with the -c flag to summarize call frequency and execution time

Both utilities operate on any ELF binary and require no dependencies, making them immediately available for production debugging.

Critical Flags for Effective Debugging

Follow Child Processes with -f

Many applications spawn subprocesses using fork or exec. Without the -f flag, you will only trace the parent process and miss critical execution paths in spawned children. As documented in the repository's README at line 336, the "trace-child" technique is essential to avoid missing important calls that often contain the root cause of a failure.

Attach to Running Processes with -p

The -p <pid> flag allows you to attach to an already-running process without restarting it. This is crucial for debugging production services that cannot be interrupted. Simply specify the process ID to begin interception immediately.

Profile Execution with -c

Use -c to generate a statistical summary instead of a full trace. This mode counts how many times each call was made and the total time spent, helping you identify bottlenecks without drowning in log output.

Filter Specific Calls with -e

Reduce noise by filtering for specific calls using -e <expr>. For example, trace only file operations with -e open,read,write or monitor memory management with -e malloc,free.

Practical Debugging Scenarios and Code Examples

Trace a New Command from Launch

Capture every system call from program startup, including timestamps and child processes:


# Trace every system call made by `ls -l /tmp`

strace -o ls.strace -tt -f ls -l /tmp

# View the detailed log with microsecond timestamps

less ls.strace

The -tt flag prints timestamps with microsecond precision, while -f ensures you capture any subprocesses spawned by the command.

Diagnose Hanging Production Processes

Attach to a live process to identify where it is stuck:


# Find the PID of the problematic process

pid=$(pgrep -f my_app)

# Attach and capture a short performance snapshot (5 seconds)

strace -p "$pid" -c -e trace=all -o snapshot.txt &
sleep 5
kill $!   # stop strace

cat snapshot.txt

The -c summary reveals which system calls dominate execution time, immediately highlighting I/O waits or infinite loops.

Isolate Memory Allocation Issues

Use ltrace to monitor specific library functions without kernel noise:


# Watch only memory-allocation functions in a program

ltrace -e malloc,free -f ./my_program 2> malloc.log

This filters the trace to show only heap allocations, revealing memory leaks or excessive allocation patterns that cause slowdowns.

Run strace and ltrace Simultaneously

For comprehensive analysis, run both tools in parallel to see kernel and library behavior together:


# Run both simultaneously (output to separate files)

strace -ff -o syscalls.log ./my_program &
ltrace -ff -o libcalls.log ./my_program &
wait   # wait for both to finish

The -ff flag creates separate output files for each followed child process, preventing log interleaving.

Identifying Performance Bottlenecks

Convert statistical output into actionable intelligence using standard Unix text processing:


# Generate a histogram of system-call frequencies

strace -c -p "$pid" 2> call_stats.txt
awk '{print $1 "\t" $2}' call_stats.txt | sort -nr | head

This pipeline extracts call counts, sorts them numerically in reverse order, and displays the top resource consumers. High counts on read, write, or futex often indicate I/O bottlenecks or lock contention.

Summary

  • strace reveals kernel-level system calls while ltrace exposes library function calls, providing complementary visibility into program execution when debugging system issues.
  • Critical flags include -f (follow children), -p (attach to PID), -c (profile performance), and -e (filter specific calls).
  • The jlevy/the-art-of-command-line README at line 336 documents these techniques as essential for diagnosing crashes, hangs, and I/O failures in production environments.
  • Both tools operate on any ELF binary without dependencies, making them immediately available for emergency debugging on any Linux system.

Frequently Asked Questions

What is the difference between strace and ltrace?

strace intercepts and records system calls made to the Linux kernel, such as file operations, network requests, and process management. ltrace intercepts calls to dynamically linked libraries, such as malloc, printf, or external shared objects, revealing higher-level program behavior without kernel-level detail.

How do I debug a program that spawns child processes?

Use the -f flag with either tool to follow processes created by fork or exec. Without this flag, you will only trace the parent process and miss critical execution paths in spawned children, potentially hiding the source of a crash or hang in a subprocess.

Can I use strace or ltrace on a process that is already running?

Yes. Use -p <pid> to attach to an existing process without restarting it. This is essential for debugging production services. You can combine this with -c to generate a performance summary over a specific time window, then detach without interrupting the service.

When should I use the -c flag versus full tracing?

Use -c when you suspect a performance bottleneck and need to identify which calls consume the most time. Use full tracing (without -c) when you need to see the exact sequence of calls, arguments, and return values to diagnose logical errors, missing files, or permission failures.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →