How Apple Container Achieves Security Isolation Using Per-Container VMs
Apple Container isolates each workload by running it inside its own dedicated Apple Virtualization Framework virtual machine, ensuring complete hardware-level sandboxing independent of the host kernel or other containers.
The apple/container project implements security isolation using per-container VMs to provide Linux container workloads with the strong boundaries traditionally associated with virtual machines. Unlike standard container runtimes that share the host kernel, this architecture launches every container inside its own lightweight VM, complete with a dedicated guest kernel, isolated filesystem, and virtual network interface. This approach eliminates kernel-sharing risks while maintaining a Docker-compatible user experience.
The Per-Container Virtualization Stack
Apple Container achieves isolation through a tightly-coupled stack of Swift components that manage every aspect of the VM lifecycle. The architecture leverages the Apple Virtualization Framework to create hardware-backed sandboxes where each container operates as an independent guest system.
| Layer | Responsibility | Key Source File |
|---|---|---|
| VM lifecycle | Creates, boots, and tears down VZVirtualMachineManager instances |
RuntimeService.swift |
| Bootstrap | Assembles kernel, filesystem, and network configuration | RuntimeRoutes.swift |
| Resources | Enforces CPU, memory, and disk limits | ContainerConfiguration.swift |
| Network | Provides isolated virtual network interfaces | ReservedVmnetNetwork.swift |
| Interface strategy | Determines host network exposure mode | NonisolatedInterfaceStrategy.swift |
VM Lifecycle Management
At the core of the isolation mechanism is RuntimeService, a Swift actor defined in Sources/Services/RuntimeLinux/Server/RuntimeService.swift that serializes access to VM state. The service instantiates a VZVirtualMachineManager for each container, ensuring that every workload runs within its own virtual hardware context.
The runtime creates the VM through a bootstrap sequence that:
- Loads the container bundle containing the kernel image and root filesystem
- Configures virtual hardware according to
ContainerConfigurationspecifications - Attaches isolated network interfaces via the
vmnetframework - Boots the guest kernel inside the secure virtualization boundary
Configuration and Bootstrap Process
When a container starts, RuntimeRoutes.bootstrap in Sources/Services/Runtime/RuntimeClient/RuntimeRoutes.swift orchestrates the initialization sequence. This method reads the ContainerConfiguration to assemble the necessary components before handing control to the VM manager.
The bootstrap process validates resource allocation limits, selects network attachment strategies, and prepares the guest environment:
// Simplified bootstrap flow from RuntimeRoutes.swift
public func bootstrap(_ message: XPCMessage) async throws -> XPCMessage {
// Load bundle containing kernel and root filesystem
let bundle = ContainerResource.Bundle(path: self.root)
var config = try bundle.configuration
let kernel = try bundle.kernel
// Initialize the VM manager with isolated resources
let vmm = VZVirtualMachineManager(
kernel: kernel,
initialFilesystem: bundle.initialFilesystem.asMount,
rosetta: config.rosetta,
logger: self.log
)
// Attach isolated network interfaces
for (idx, netInfo) in try message.networkBootstrapInfos().enumerated() {
let client = ContainerNetworkClient.NetworkClient(
id: config.networks[idx].network,
plugin: netInfo.plugin
)
let session = client.connect()
let (attachment, _) = try await client.allocate(hostname: …, on: session)
vmm.attach(network: attachment)
}
// Start the isolated VM
try await vmm.start()
return message.reply()
}
Kernel and Filesystem Isolation
Each VM receives its own kernel image and read-only root filesystem, completely separate from the host's filesystem tree. The ContainerResource.Bundle class defined in Sources/ContainerResource/Bundle.swift encapsulates the bundle.kernel and bundle.initialFilesystem resources that constitute the guest environment.
Because the container runs a real Linux kernel inside the VM, processes cannot access host filesystem paths or escape through kernel-level exploits that might affect shared-kernel container runtimes. The filesystem layer remains read-only from the host perspective, preventing tampering with the guest root image.
Network Isolation via vmnet
Network segmentation is enforced through the vmnet virtual network framework. The ReservedVmnetNetwork class in Sources/Services/NetworkVmnet/Server/ReservedVmnetNetwork.swift creates dedicated virtual networks for each container VM, assigning unique MAC addresses and IP configurations that remain invisible to other containers.
The system supports multiple network modes:
- Host-only: VM communicates only with the host system
- Shared: VM accesses external networks through NAT while remaining isolated from other VMs
The NonisolatedInterfaceStrategy in Sources/Services/RuntimeLinux/Server/NonisolatedInterfaceStrategy.swift determines whether a container uses a dedicated vmnet interface or shares host networking, with the default configuration favoring isolated attachments for security.
Resource Limits and Enforcement
Hardware constraints are defined in ContainerConfiguration (Sources/ContainerResource/Container/ContainerConfiguration.swift) and enforced by the VZVirtualMachineManager. The configuration specifies:
- CPU cores: Virtual processor allocation per container
- Memory: RAM limits with ballooning support
- Disk size: Storage quotas for writable layers
- Nested virtualization: Optional support for running VMs within containers
These limits prevent runaway containers from consuming excessive host resources, providing quality-of-service guarantees alongside security boundaries.
Guest-Agent Communication
After boot, a lightweight guest-agent runs inside each VM to handle I/O forwarding, health reporting, and socket proxying. The agent communicates with the host via XPC (Inter-Process Communication) protocols defined in Sources/ContainerXPC, keeping the host-side implementation minimal and sandboxed.
The guest agent can forward specific host resources, such as SSH authentication sockets, into the isolated environment without breaking the VM boundary:
// Guest-side agent socket forwarding
let socket = try Socket(path: RuntimeService.sshAuthSocketGuestPath)
try socket.listen()
This architecture ensures that privileged operations remain inside the VM while still enabling necessary host integrations.
Creating VM-Backed Containers
Users interact with this isolation model through the Docker-compatible CLI. The container machine create command, implemented in Sources/ContainerCommands/Machine/MachineCreate.swift, translates user specifications into VM configurations:
// CLI invocation creates a VM-backed container
let createCmd = MachineCreate()
try createCmd.run([
"--name", "my-app",
"--cpu", "2",
"--memory", "4G",
"--disk", "10G",
"--nested-virt" // Enables nested virtualization on Apple Silicon
])
Each created "machine" corresponds to a fully isolated VM with its own virtualization context, hardware allocation, and network stack.
Summary
- Apple Container achieves security isolation using per-container VMs by running each workload inside its own Apple Virtualization Framework instance
- The
RuntimeServiceactor managesVZVirtualMachineManagerinstances to provide hardware-backed sandboxes - Each container receives isolated kernel images, read-only root filesystems, and dedicated
vmnetnetwork interfaces - Resource limits from
ContainerConfigurationare enforced at the hypervisor level to prevent host resource exhaustion - A guest-side XPC agent enables secure host-to-guest communication without compromising the isolation boundary
- The CLI (
MachineCreate.swift) provides Docker-compatible UX while maintaining VM-level security guarantees
Frequently Asked Questions
How does apple/container differ from Docker's container model?
Docker containers share the host kernel and rely on Linux namespaces and cgroups for isolation. In contrast, apple/container launches each container inside a dedicated VM with its own guest kernel, providing hardware-level isolation that prevents kernel exploits from affecting the host or other containers. This architecture trades slightly higher resource overhead for significantly stronger security boundaries.
What virtualization framework powers the per-container VM isolation?
The project uses the Apple Virtualization Framework (VZVirtualMachineManager), a macOS API that leverages Apple Silicon hardware virtualization extensions. This framework manages virtual CPUs, memory, storage controllers, and network adapters for each container VM, as implemented in Sources/Services/RuntimeLinux/Server/RuntimeService.swift.
How does the network isolation prevent container-to-container traffic sniffing?
Each container VM connects to isolated vmnet networks created by ReservedVmnetNetwork.swift. The framework assigns unique MAC addresses and IP configurations to each VM, and the virtualization layer prevents packets from leaking between interfaces. Even in shared networking mode, the hypervisor filters traffic to ensure containers cannot see each other's network communications.
Can resource limits be dynamically adjusted after container creation?
Resource limits are defined in ContainerConfiguration during the bootstrap phase and enforced by the VZVirtualMachineManager. While the current implementation primarily focuses on static allocation at creation time (via MachineCreate.swift), the hypervisor architecture supports hot-plugging certain resources, though dynamic adjustment would require XPC coordination between the host's RuntimeService and the guest agent.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →