How Traefik Discovers and Manages Services in Docker Swarm
Traefik uses a dedicated Swarm provider in pkg/provider/docker/pswarm.go to poll the Docker Engine API, converting Swarm services and tasks into dynamic routing configurations with automatic refresh intervals.
To discover and manage services in a Docker Swarm environment, Traefik implements a specialized provider that interacts directly with the Docker Engine API rather than relying on static configuration files. This provider, distinct from Traefik's standard Docker integration, understands Swarm-specific constructs such as overlay networks, Virtual IPs (VIPs), and task states. By continuously polling the Swarm manager, Traefik maintains a real-time view of the cluster topology, automatically updating routing rules as services scale, migrate, or undergo rolling updates.
Swarm Provider Architecture
The Swarm provider is implemented in pkg/provider/docker/pswarm.go. It embeds Shared configuration and ClientConfig from the shared Docker provider infrastructure, allowing it to reuse connection logic while implementing Swarm-specific discovery semantics.
Key supporting files include:
pkg/provider/docker/shared.go: ContainsNewDynConfBuilderandextractSwarmLabelsused to transform discovered data into Traefik's dynamic configuration model.pkg/provider/docker/shared_labels.go: Defines theLBSwarmlabel structure and parsing logic for Swarm-specific Traefik labels.pkg/config/static/static_config.go: Registers the Swarm provider in the static configuration (lines 240-301) and defines default polling intervals.
Initializing the Swarm Provider
Initialization occurs in two phases: default configuration and template compilation.
The SetDefaults method (lines 29-44) configures the provider to watch the Docker Unix socket with a 15-second refresh interval:
func (p *SwarmProvider) SetDefaults() {
p.Watch = true
p.Endpoint = "unix:///var/run/docker.sock"
p.RefreshSeconds = ptypes.Duration(15 * time.Second)
}
During Init, the provider compiles the default rule template used to generate router names from service metadata. This template is applied to all discovered services unless overridden by specific labels.
The createClient method (lines 53-55) instantiates the Docker Engine API client using the shared helper from shared.go, establishing the communication channel for subsequent discovery operations:
func (p *SwarmProvider) createClient(ctx context.Context) (*client.Client, error) {
return createClient(ctx, p.ClientConfig)
}
Service Discovery Workflow
The core discovery logic resides in listServices (lines 57-89), which orchestrates the translation of Swarm services into Traefik's internal dockerData structures.
Building the Network Map
Before processing services, the provider constructs a network map to resolve virtual IP addresses. It queries the Docker API version to determine the appropriate filter strategy:
- For API versions ≥ 1.29: Uses
scope=swarmfilter - For older versions: Uses
driver=overlayfilter
This network map (network ID → summary) enables subsequent lookup of overlay network names and subnets when parsing service endpoints:
func (p *SwarmProvider) listServices(ctx context.Context, dockerClient client.APIClient) ([]dockerData, error) {
serviceList, _ := dockerClient.ServiceList(ctx, swarmtypes.ServiceListOptions{})
serverVersion, _ := dockerClient.ServerVersion(ctx)
// Choose proper network filter
networkListArgs := filters.NewArgs()
if versions.GreaterThanOrEqualTo(serverVersion.APIVersion, "1.29") {
networkListArgs.Add("scope", "swarm")
} else {
networkListArgs.Add("driver", "overlay")
}
networkList, _ := dockerClient.NetworkList(ctx, networktypes.ListOptions{Filters: networkListArgs})
// Build map: network ID → summary
networkMap := make(map[string]*networktypes.Summary)
for _, net := range networkList { networkMap[net.ID] = &net }
// ... service parsing logic
}
Parsing Service Metadata
For each service returned by dockerClient.ServiceList, the provider invokes parseService to extract:
- Service identity: ID, name, and labels from
service.Spec.Annotations - Swarm-specific configuration: Parsed via
extractSwarmLabels, including thetraefik.swarm.LBSwarmflag - Virtual IPs: When
ResolutionModeVIPis enabled, the provider maps eachVirtualIPfromservice.Endpointto its corresponding network entry using the previously built network map
If traefik.swarm.LBSwarm=true, the service itself becomes the routing target. Otherwise, Traefik must descend to the task level to discover individual container instances:
func (p *SwarmProvider) parseService(ctx context.Context, service swarmtypes.Service,
networkMap map[string]*networktypes.Summary) (dockerData, error) {
d := dockerData{
ID: service.ID,
ServiceName: service.Spec.Annotations.Name,
Labels: service.Spec.Annotations.Labels,
}
// Load Traefik‑specific Swarm labels
extra, err := p.extractSwarmLabels(d)
d.ExtraConf = extra
// Handle VIP mode – map each VirtualIP to a network entry
if service.Spec.EndpointSpec != nil && service.Spec.EndpointSpec.Mode == swarmtypes.ResolutionModeVIP {
d.NetworkSettings.Networks = make(map[string]*networkData)
for _, vip := range service.Endpoint.VirtualIPs {
net := networkMap[vip.NetworkID]
if net == nil { continue }
ip, _, _ := net.ParseCIDR(vip.Addr)
d.NetworkSettings.Networks[net.Name] = &networkData{
Name: net.Name,
ID: vip.NetworkID,
Addr: ip.String(),
}
}
}
return d, nil
}
Task-Level Discovery for Container Routing
When load balancing at the task level is required, the provider invokes listTasks (lines 65-78) to enumerate the running containers implementing a service.
Enumerating Running Tasks
The method filters the Docker API for tasks matching:
- The specific service ID
- Desired state of
running
It then iterates through the task list, skipping any tasks not in the TaskStateRunning state:
func listTasks(ctx context.Context, dockerClient client.APIClient, serviceID string,
serviceDockerData dockerData, networkMap map[string]*networktypes.Summary, isGlobalSvc bool) ([]dockerData, error) {
filter := filters.NewArgs()
filter.Add("service", serviceID)
filter.Add("desired-state", "running")
taskList, _ := dockerClient.TaskList(ctx, swarmtypes.TaskListOptions{Filters: filter})
for _, task := range taskList {
if task.Status.State != swarmtypes.TaskStateRunning { continue }
d, err := parseTasks(ctx, dockerClient, task, serviceDockerData, networkMap, isGlobalSvc)
// keep tasks that expose at least one network
}
// …
}
Extracting Node and Network Details
For each valid task, parseTasks (lines 94-124) constructs a dockerData entry containing:
- Task identification: Composite name combining service name and task slot (or task ID for global services)
- Node IP: Retrieved via
NodeInspectWithRawusing the task'sNodeID, stored inNodeIPfor edge routing scenarios - Network attachments: Mapped from
task.NetworksAttachmentsto the network map, extracting IP addresses and associating them with overlay network names
Only tasks exposing at least one network attachment are retained as valid routing targets:
func parseTasks(ctx context.Context, dockerClient client.APIClient, task swarmtypes.Task,
serviceDockerData dockerData, networkMap map[string]*networktypes.Summary, isGlobalSvc bool) (dockerData, error) {
d := dockerData{
ID: task.ID,
ServiceName: serviceDockerData.Name,
Name: serviceDockerData.Name + "." + strconv.Itoa(task.Slot),
Labels: serviceDockerData.Labels,
ExtraConf: serviceDockerData.ExtraConf,
NetworkSettings: networkSettings{},
}
// Global mode uses the task ID in the name
if isGlobalSvc { d.Name = serviceDockerData.Name + "." + task.ID }
// Node IP (useful for edge‑routing)
if task.NodeID != "" {
node, _, _ := dockerClient.NodeInspectWithRaw(ctx, task.NodeID)
d.NodeIP = node.Status.Addr
}
// Attach network IPs
if task.NetworksAttachments != nil {
d.NetworkSettings.Networks = make(map[string]*networkData)
for _, netAttach := range task.NetworksAttachments {
if net, ok := networkMap[netAttach.Network.ID]; ok && len(netAttach.Addresses) > 0 {
for _, addr := range netAttach.Addresses {
ip, _, _ := net.ParseCIDR(addr)
d.NetworkSettings.Networks[net.Name] = &networkData{
ID: netAttach.Network.ID,
Name: net.Name,
Addr: ip.String(),
}
}
}
}
}
return d, nil
}
Configuration Building and Dynamic Updates
Once service and task data are collected, the provider delegates configuration generation to NewDynConfBuilder from pkg/provider/docker/shared.go. This builder transforms the dockerData slice into a dynamic.Message containing routers, services, and middleware definitions that Traefik's internal router consumes.
The provider supports continuous synchronization through a polling mechanism:
- Polling interval: Controlled by
RefreshSeconds(default 15 seconds) defined inpkg/config/static/static_config.go - Watch mode: When
Watchis true (the default), a ticker triggerslistServicesrepeatedly, pushing updated configurations to Traefik's internal router - State retention: If a poll fails due to API unavailability, Traefik retains the last known good configuration, ensuring continuous traffic flow while attempting to reconnect
This architecture ensures that service scaling, rolling updates, and node migrations in Docker Swarm are automatically reflected in Traefik's routing configuration without manual intervention.
Summary
- Traefik discovers and manages services in a Docker Swarm environment through the
SwarmProviderinpkg/provider/docker/pswarm.go, distinct from the standard Docker container provider. - The provider polls the Docker Engine API every 15 seconds by default, querying service definitions, overlay networks, and task states to build a complete topology map.
- Service-level discovery extracts Virtual IPs (VIPs) from
service.EndpointwhenResolutionModeVIPis enabled, using a network map constructed with API version-aware filters (scope=swarmvsdriver=overlay). - When
traefik.swarm.LBSwarmis not enabled, the provider descends to task-level discovery vialistTasksandparseTasks, collecting node IPs fromNodeInspectWithRawand network attachments fromtask.NetworksAttachments. - Configuration generation is delegated to
NewDynConfBuilderinshared.go, producing dynamic routing rules that automatically update as Swarm services scale or migrate across nodes.
Frequently Asked Questions
How does Traefik handle service updates during rolling deployments in Docker Swarm?
Traefik's Swarm provider detects changes through its polling mechanism, which queries the Docker Engine API at intervals defined by RefreshSeconds (default 15 seconds). During a rolling deployment, as new tasks reach the running state and old tasks terminate, the next invocation of listServices and listTasks captures the updated task list, immediately reconfiguring the load balancer to route traffic only to healthy instances while maintaining existing connections during the transition.
What is the difference between VIP and task-based routing in Traefik's Swarm discovery?
VIP-based routing occurs when a service uses ResolutionModeVIP and the traefik.swarm.LBSwarm label is set to true, causing Traefik to route directly to the service's Virtual IP managed by Docker's internal IPVS load balancer. Task-based routing, the default behavior, occurs when LBSwarm is disabled or omitted, requiring Traefik to discover individual tasks via parseTasks, extract their specific node IPs and network attachments, and perform client-side load balancing across actual container instances.
How does Traefik secure the connection to the Docker Swarm manager nodes?
The Swarm provider inherits ClientConfig from the shared Docker provider infrastructure defined in pkg/provider/docker/shared.go, supporting TLS configuration parameters including TLS.CA, TLS.Cert, and TLS.Key. When createClient initializes the Docker Engine API client, these credentials are passed to establish a TLS-encrypted connection, ensuring that Traefik can securely authenticate to remote Swarm managers or encrypted local sockets without exposing plaintext API access.
What happens if the Docker Swarm API becomes temporarily unavailable?
If the Docker client encounters connectivity issues during the polling cycle in listServices, the provider returns the error upstream, causing that specific refresh iteration to fail without updating the dynamic configuration. Traefik retains the last known good configuration state, ensuring that existing routing rules continue to function and traffic flows to previously discovered backends while the provider attempts to reconnect during subsequent polling intervals defined by RefreshSeconds.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →