How to Use SandboxAgent for Long-Running Tasks with Filesystem Access in openai-agents-python

Use SandboxAgent with a Manifest defining your filesystem, attach a WorkspaceShellCapability for PTY-based command execution, and configure a SandboxRunConfig with a backend like UnixLocalSandboxClient or DockerSandboxClient to safely run long-running processes with full filesystem access.

The openai-agents-python SDK provides a secure, isolated environment for running autonomous AI workflows. By leveraging SandboxAgent for long-running tasks with filesystem access, you can enable LLMs to execute persistent shell commands, compile code, and manipulate files without exposing the host system to unbounded operations.

What is SandboxAgent?

SandboxAgent is a specialized Agent implementation designed to run model-driven workflows inside an isolated sandbox environment. Unlike standard agents that operate only through function calling, SandboxAgent integrates deeply with a sandbox session that provides:

  • A virtual filesystem defined by a Manifest
  • Process execution via exec for one-shot commands
  • Long-running process streaming via pty_exec_start for interactive or continuous output
  • Auditing and tracing of every sandbox operation

The agent resides in src/agents/sandbox/sandbox_agent.py and serves as the entry point for filesystem-aware, long-running automation tasks.

Core Components for Filesystem and Long-Running Tasks

To implement SandboxAgent effectively, you must configure three core components: the filesystem manifest, the session execution layer, and the shell capability that exposes these operations to the LLM.

Manifest: Defining the Filesystem Workspace

The Manifest class (located in src/agents/sandbox/manifest.py) provides a declarative description of the sandbox's filesystem state, including files, directories, environment variables, and mount points. When you initialize a SandboxAgent with a default_manifest, the sandbox session mounts this virtual file tree, allowing the LLM to read, write, and manipulate paths under /workspace without touching the host filesystem.

SandboxSession: Executing Long-Running Commands

The SandboxSession class (in src/agents/sandbox/session/sandbox_session.py) wraps all sandbox operations with audit events, tracing, and PTY support. For long-running tasks, two methods are critical:

  • SandboxSession.exec (lines 94-101): Executes one-shot shell commands with stdout/stderr capture.
  • SandboxSession.pty_exec_start (lines 124-134): Starts a pseudo-terminal session for interactive or streaming processes, enabling the LLM to monitor continuous output from builds, servers, or log tails.

Both methods emit SandboxSessionStartEvent and SandboxSessionFinishEvent for full observability, implemented in the session's _annotate method (lines 161-200).

WorkspaceShellCapability: Exposing Tools to the LLM

To allow the model to invoke sandbox operations, you must attach a capability that registers the relevant tools. The WorkspaceShellCapability (found in examples/sandbox/misc/workspace_shell.py) bundles the shell tool, exposing exec and pty_exec_start as function tools that the LLM can call during its reasoning loop.

Implementing Long-Running Tasks with Filesystem Access

Below are practical implementations showing how to configure SandboxAgent for different scenarios involving persistent processes and file manipulation.

Running a Background Process with PTY Streaming

This example demonstrates starting a long-running command, monitoring its output, and terminating it programmatically:

import asyncio
from pathlib import Path

from agents import Runner
from agents.run import RunConfig
from agents.sandbox import SandboxAgent, SandboxRunConfig
from agents.sandbox.sandboxes.unix_local import UnixLocalSandboxClient
from agents.sandbox.capabilities.tools.shell_tool import WorkspaceShellCapability
from agents.sandbox.manifest import Manifest

async def main() -> None:
    # Create an empty workspace manifest

    manifest = Manifest()

    # Configure the sandbox agent with shell capability

    agent = SandboxAgent(
        name="Long-run worker",
        model="gpt-4o-mini",
        instructions="You can run any command. For long-running processes use `pty_exec_start`.",
        default_manifest=manifest,
        capabilities=[WorkspaceShellCapability()],
    )

    # Execute with the Unix local sandbox client

    result = await Runner.run(
        agent,
        "Start a background `yes` command that prints forever; then stop it after 3 seconds.",
        run_config=RunConfig(
            sandbox=SandboxRunConfig(client=UnixLocalSandboxClient())
        ),
    )

    print("Final LLM output:")
    print(result.final_output)

asyncio.run(main())

Under the hood, when the LLM invokes pty_exec_start, SandboxSession.pty_exec_start (lines 124-134 in sandbox_session.py) forwards the request to the UnixLocalSandboxClient, creating a pseudo-terminal session that streams output back to the model until terminated.

Editing Files and Running a Build Process

This example combines filesystem manipulation with a long-running compilation task:

import asyncio
from pathlib import Path

from agents import Runner, function_tool
from agents.mcp import MCPServerStdio
from agents.run import RunConfig
from agents.sandbox import SandboxAgent, SandboxRunConfig
from agents.sandbox.sandboxes.unix_local import UnixLocalSandboxClient
from examples.sandbox.misc.workspace_shell import WorkspaceShellCapability
from examples.sandbox.misc.example_support import text_manifest

@function_tool
def get_build_flags() -> str:
    """Return compiler flags that the LLM can use."""
    return "-O2 -Wall"

async def main() -> None:
    # Create a manifest with initial source code

    manifest = text_manifest(
        {
            "main.c": """
            #include <stdio.h>
            int main() { printf("Hello\\n"); return 0; }
            """
        }
    )

    # Configure agent with both filesystem and shell access

    agent = SandboxAgent(
        name="C-builder",
        model="gpt-4o-mini",
        instructions=(
            "You can edit source files, run `gcc` to build, and stream the build output. "
            "Always call `get_build_flags` to obtain compiler options before building."
        ),
        default_manifest=manifest,
        tools=[get_build_flags],
        capabilities=[WorkspaceShellCapability()],
    )

    # Run the build process

    result = await Runner.run(
        agent,
        "Patch main.c to print 'Hi there' and compile it with gcc, streaming the output.",
        run_config=RunConfig(sandbox=SandboxRunConfig(client=UnixLocalSandboxClient())),
    )

    print("\n=== Final answer ===")
    print(result.final_output)

asyncio.run(main())

Here, the LLM uses the sandbox's file APIs (routed through SandboxSession) to read and modify main.c, then invokes pty_exec_start via the WorkspaceShellCapability to stream the gcc compilation output in real-time.

Using Docker for Enhanced Isolation

For production workloads requiring stronger isolation than the local Unix client provides, swap the backend to DockerSandboxClient:

from agents.sandbox.sandboxes.docker import DockerSandboxClient

# Configure Docker backend

run_config=RunConfig(
    sandbox=SandboxRunConfig(client=DockerSandboxClient(image="python:3.12-slim"))
)

The same SandboxAgent code works unchanged across backends because SandboxSession abstracts the transport layer, calling exec and pty_exec_start on whichever SandboxClient implementation you provide.

Key Source Files and Architecture

Understanding the architecture helps debug and extend sandbox behavior. These files define the core implementation:

File Role
src/agents/sandbox/sandbox_agent.py Public SandboxAgent class that glues manifest, capabilities, and tools together.
src/agents/sandbox/session/sandbox_session.py Core session implementation – wraps all sandbox ops with audit events, tracing, and PTY support at lines 94-101 (exec) and 124-134 (pty_exec_start).
src/agents/sandbox/manifest.py Declarative description of the sandbox's filesystem, environment vars, and mount points.
src/agents/sandbox/sandboxes/unix_local.py Simple local sandbox client used in most examples; runs commands directly on the host with isolation through process namespaces.
src/agents/sandbox/capabilities/tools/shell_tool.py Implements the exec and pty_exec_start tools exposed to the model via the shell capability.
examples/sandbox/misc/workspace_shell.py Example capability bundling the shell tool for workspace operations.

Summary

  • SandboxAgent provides an isolated environment for LLM-driven workflows with full filesystem and process execution capabilities.
  • Manifest declaratively defines the filesystem workspace available to the agent, configured via default_manifest.
  • Long-running tasks require the PTY execution method (pty_exec_start) rather than one-shot exec, accessible through WorkspaceShellCapability.
  • SandboxRunConfig determines the backend isolation level, with UnixLocalSandboxClient for local development and DockerSandboxClient for production isolation.
  • All operations are traced through SandboxSession events, providing full auditability of file and process interactions.

Frequently Asked Questions

What is the difference between exec and pty_exec_start in SandboxAgent?

exec (implemented in src/agents/sandbox/session/sandbox_session.py lines 94-101) runs a one-shot command and captures stdout/stderr after completion, suitable for quick queries like ls or cat. pty_exec_start (lines 124-134) creates a pseudo-terminal session that streams output continuously, making it essential for long-running processes like builds, servers, or log monitoring that produce output over time.

How do I persist files between SandboxAgent runs?

Files persist within the Manifest definition for the duration of a single run, but the Manifest itself is declarative and constructed at runtime. To persist files across separate agent invocations, you must either define the file contents in your code when constructing the Manifest (using text_manifest or file references), or mount external storage volumes via the Manifest's mount configuration. The actual filesystem state is maintained by the SandboxClient backend (e.g., Docker volumes for DockerSandboxClient).

Can I use SandboxAgent with custom tools alongside filesystem access?

Yes, SandboxAgent accepts both capabilities (like WorkspaceShellCapability) and standard tools simultaneously. The tools parameter accepts Python functions decorated with @function_tool, while capabilities accept objects that expose sandbox-specific operations. In the article's second example, the agent uses get_build_flags (a custom function tool) alongside the shell capability to compile code, demonstrating how filesystem operations and custom logic work together.

Which sandbox backend should I choose for production?

For development and testing, use UnixLocalSandboxClient (from src/agents/sandbox/sandboxes/unix_local.py) which runs commands directly on the host with minimal overhead. For production requiring stronger isolation, use DockerSandboxClient (from src/agents/sandbox/sandboxes/docker.py) with a specific container image (e.g., python:3.12-slim), ensuring that long-running processes and filesystem modifications are contained within the Docker environment and cannot affect the host system.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →