Production-Ready Docker Swarm: Deployment, Scaling, and Operations

Docker SwarmOdoo

2025-12-06

Docker Swarm

Docker Swarm is Docker’s built-in orchestration tool that allows you to manage multiple containers across multiple servers as a single cluster. While Docker itself runs containers on one machine, Swarm transforms several Docker hosts into a unified, scalable, and resilient environment. This makes it suitable for production workloads, distributed systems, and highly available applications. Swarm offers automatic load balancing, rolling updates, service scaling, and robust networking — all while using simple Docker commands. In this article, we will explore how Swarm works, how it differs from Docker Compose, and how to deploy services step-by-step.

1. What is Docker Swarm?

Docker Swarm is a clustering and orchestration engine built directly into the Docker ecosystem. It enables you to group multiple servers (nodes) and manage them as a single virtual Docker host. Swarm automates tasks such as scheduling containers, replacing failed containers, distributing workloads, rolling updates, and maintaining the desired state of your applications.

When you deploy a service with 5 replicas and one node goes down, Swarm automatically recreates the lost replicas on healthy nodes. This demonstrates Swarm's self-healing capability, which is essential for maintaining high availability in production environments. The orchestrator continuously monitors the cluster state and takes corrective action whenever the actual state diverges from the desired state, ensuring your applications remain available without manual intervention.

2. Swarm vs Docker Compose

Docker Compose is designed for local development and runs containers only on a single machine. Docker Swarm is designed for production and manages containers across multiple machines.

Feature Docker Compose Docker Swarm
Scope Single machine Multiple machines
Purpose Development Production
Scaling Manual, limited Automatic, cluster-wide
Load balancing None Built-in routing mesh
High availability No Yes
Deployment docker compose up docker stack deploy
Health checks Basic Advanced with auto-recovery
Secrets management Environment files Encrypted secrets

Feature	Docker Compose	Docker Swarm
Scope	Single machine	Multiple machines
Purpose	Development	Production
Scaling	Manual, limited	Automatic, cluster-wide
Load balancing	None	Built-in routing mesh
High availability	No	Yes
Deployment	docker compose up	docker stack deploy
Health checks	Basic	Advanced with auto-recovery
Secrets management	Environment files	Encrypted secrets

This comparison table shows why you'd use Compose for development but Swarm for production. Compose is perfect when you're building and testing on your laptop, but Swarm provides the production- grade features needed when deploying to real servers serving real users.

services:
  web:
    image: nginx

This simple Compose file defines a single web service using nginx. It works great locally but has no scaling, no load balancing, and no high availability. If you run this with docker compose up, you get one nginx container on your local machine.

services:
  web:
    image: nginx
    deploy:
      replicas: 5

This Swarm stack file looks nearly identical to the Compose file, demonstrating Swarm's ease of adoption. However, the deploy section adds orchestration capabilities. The replicas: 5 directive tells Swarm to maintain exactly 5 running copies of the nginx container at all times. Swarm distributes these replicas across available nodes in your cluster, provides automatic load balancing between them, and ensures that if any replica fails, it's immediately replaced. This same file works in both development with Compose and production with Swarm, but Swarm adds all the production features automatically when deployed with docker stack deploy.

3. Initializing Swarm — docker swarm init

To create your first cluster, you initialize Swarm on the first node that will become your manager:

docker swarm init

This simple command transforms a standalone Docker host into a Swarm manager. It performs several critical operations in the background. First, it makes your machine the first manager node, giving it authority to control the cluster. Second, it starts the Raft consensus algorithm, which managers use to maintain a consistent view of the cluster state. Third, it generates secure join tokens that other nodes will use to join the cluster. Finally, it creates the default networks needed for Swarm operation, including the ingress network for load balancing and the docker_gwbridge network for container connectivity.

docker swarm init --advertise-addr 192.168.1.100

In production environments, servers often have multiple network interfaces. The --advertise-addr flag tells Swarm which IP address to use for cluster communication. This is particularly important when your server has both a public-facing IP and a private internal network IP. You typically want Swarm to use the private network for inter-node communication for security and performance reasons. Specifying the advertise address ensures all nodes communicate using the correct network interface.

4. Manager Node vs Worker Node

Manager Node

Manager nodes are the control plane of your Swarm cluster. They maintain the cluster state using the Raft consensus algorithm, which ensures all managers have a consistent view of services, networks, secrets, and nodes. Managers schedule tasks by deciding which worker nodes should run which containers based on resource availability, placement constraints, and current workload distribution. They approve node membership by validating join tokens and adding new nodes to the cluster. Managers ensure consistency by continuously monitoring the cluster and reconciling any differences between desired and actual state.

While managers can also run workloads, this is not recommended in production environments. Running application containers on managers can consume resources needed for orchestration tasks and potentially impact cluster stability during high load. Instead, production clusters should have dedicated manager nodes that only handle orchestration responsibilities.

Worker Node

Worker nodes are the workhorses of your Swarm cluster. They run tasks (containers) as assigned by managers but don't make scheduling decisions themselves. Workers don't participate in Raft consensus and don't maintain cluster state. They execute what managers assign by pulling container images, starting containers, and monitoring their health. Workers report status back to managers through regular heartbeat messages, informing them about available resources, running containers, and node health.

Promotion and demotion

docker node promote worker-node1

This command promotes a worker node to a manager, adding it to the Raft consensus group. You might do this when expanding your cluster or before performing maintenance on an existing manager. The promotion is immediate, and the node begins participating in cluster management decisions. This is useful when you need additional manager capacity or want to increase fault tolerance from 1 manager to 3 managers.

docker node demote manager-node2

Demoting a manager removes it from the Raft consensus group and converts it to a worker. You might demote a manager before decommissioning it or when reducing cluster size. After demotion, the node continues running but only executes workloads rather than participating in cluster management. This is helpful when you need to perform major maintenance on a manager node or reduce the number of managers in an over-provisioned cluster.

5. Creating and Joining Nodes to a Cluster

Get worker token from manager:

docker swarm join-token worker

This command retrieves the join token specifically for adding worker nodes to your cluster. The token is a secure credential that includes encrypted information about your cluster. When executed on a manager, it displays the complete join command that you can copy and run on any machine you want to add as a worker. The token doesn't expire by default, so you can reuse it to add multiple workers over time. However, if a token is compromised, you can rotate it using docker swarm join-token --rotate worker.

On a new machine (join as worker):

docker swarm join --token <token> <manager-ip>:2377

When you run this command on a new server, that machine contacts the manager at the specified IP address on port 2377, presents the join token for validation, and if accepted, becomes part of the cluster as a worker node. The manager immediately begins sending work assignments to the new worker based on scheduled services. The new worker starts reporting its resources and status back to the managers. This process typically completes in seconds, and the new node becomes immediately available for scheduling workloads.

To join as a manager:

docker swarm join-token manager
docker swarm join --token <manager-token> <manager-ip>:2377

Adding manager nodes requires a different token for security reasons. Manager tokens grant much more authority since managers control the entire cluster. When a new manager joins, it downloads the current Raft log, synchronizes the cluster state, and begins participating in consensus. New managers can immediately accept worker joins, schedule tasks, and participate in leader elections. This is essential for building high-availability clusters where manager failure won't cause service disruption.

Leave the swarm:

docker swarm leave

This command makes a worker node leave the cluster gracefully. Before leaving, the node stops accepting new work. Managers detect the departure and reschedule any tasks that were running on that node to other available nodes. This is useful during decommissioning or when moving a node to a different cluster.

docker swarm leave --force

Force leave is necessary for manager nodes or when a node can't contact the cluster. For managers, you typically want to demote them first, then leave. Force leave is also useful when a node has network issues and can't perform a graceful departure. After a forced leave, you should run docker node rm from another manager to clean up the node's cluster membership records.

6. Services vs Containers in Swarm

A container is a single running instance of an image, executing on one specific node. A service is a higher-level abstraction that defines how containers should run across the cluster.

A service definition includes the image to use, which specifies what software runs in your containers. The replica count determines how many copies of the container should run simultaneously across the cluster. The update strategy defines how Swarm rolls out new versions without downtime. Resource limits prevent containers from consuming excessive CPU or memory. Network attachments specify which overlay networks the service's containers should join. Secrets and configs provide secure configuration data to containers.

Example - Create a service:

docker service create --name web --replicas 3 nginx

This command creates a service named "web" that maintains 3 running nginx containers across your cluster. Swarm immediately schedules these 3 replicas onto available worker nodes, distributing them for optimal resource utilization and fault tolerance. If you have 5 worker nodes, Swarm might place them on three different nodes. The service continuously monitors these replicas, and if any container crashes, Swarm automatically starts a replacement. If a node fails, Swarm reschedules all containers that were running on that node to healthy nodes.

Swarm automatically manages several critical aspects for you. Replication ensures the specified number of containers always run. If you specified 3 replicas, Swarm maintains exactly 3 running containers at all times. Restarting failed containers happens automatically when a container crashes due to application errors. Rescheduling on node failure means that if an entire node goes offline, all its workloads move to healthy nodes within seconds. Load balancing distributes incoming traffic across all 3 nginx replicas, so users get a responsive experience regardless of which replica handles their request.

Service Modes:

The replicated mode (default) runs a specific number of task replicas across the cluster. This is what you use for most services where you want N copies running. For example, replicas: 5 means 5 containers spread across your cluster, placed optimally based on available resources.

docker service create --name monitoring --mode global prometheus

Global mode runs exactly one replica on every node in the cluster. This is perfect for monitoring agents, log collectors, or any service that needs to run on every machine. When you add a new node to the cluster, Swarm automatically starts a replica of all global services on that node. If you remove a node, its global service replicas are automatically cleaned up. Global services are essential for cluster-wide functionality like collecting metrics from every node or providing local caching on each machine.

7. Deploying Stacks — docker stack deploy

Stacks allow multi-service deployments using a Compose-like YAML file. A stack is essentially a collection of services that make up a complete application, defined in a single declarative file.

Deploy:

docker stack deploy -c stack.yml mystack

This command reads the stack.yml file, creates or updates all services defined in it, creates networks if they don't exist, creates secrets and configs as needed, and begins deploying containers across the cluster. The -c flag specifies the compose file to use. The mystack argument names your stack, allowing you to manage multiple applications on the same cluster. Stack deployment is idempotent, meaning you can run it repeatedly and Swarm will only make necessary changes, making it safe for continuous deployment workflows.

Check stack:

docker stack services mystack

This shows all services that belong to the stack, their replica counts, which images they're running, and which ports are published. It's a quick way to verify your entire application deployed successfully.

docker stack ps mystack

This displays every individual container (task) running as part of the stack, showing which node each task runs on, its current state, when it started, and any error messages if containers failed. This is invaluable for troubleshooting deployment issues.

Remove stack:

docker stack rm mystack

Removing a stack cleanly tears down all services, removes all containers, but preserves networks and volumes by default. This allows you to redeploy without losing persistent data. It's a clean way to decommission an entire application in one command.

Example stack (full application):

version: '3.8'

services:
  web:
    image: nginx:alpine
    ports:
      - "80:80"
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure
        max_attempts: 3
    networks:
      - frontend

  api:
    image: node:20-alpine
    deploy:
      replicas: 4
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 256M
    secrets:
      - db_password
    networks:
      - frontend
      - backend

  db:
    image: postgres:15
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role == manager
    secrets:
      - db_password
    volumes:
      - db-data:/var/lib/postgresql/data
    networks:
      - backend

networks:
  frontend:
    driver: overlay
  backend:
    driver: overlay

secrets:
  db_password:
    external: true

volumes:
  db-data:

This stack file defines a complete three-tier application. The web service runs 3 nginx replicas on port 80, serving as a front-end load balancer or web server. The update configuration ensures rolling updates happen one container at a time with a 10-second delay between updates, preventing downtime. The restart policy tells Swarm to restart containers that crash, but give up after 3 failed attempts to prevent infinite restart loops.

8. Scaling Services — docker service scale

Scale up:

docker service scale web=10

This command changes the web service from its current replica count to 10. Swarm immediately begins creating additional containers to reach the target count. If the service had 3 replicas, Swarm creates 7 new containers and distributes them across available nodes. The scaling happens in parallel, with all new containers starting simultaneously. This is useful when you anticipate increased traffic and need more capacity quickly.

Scale down:

docker service scale web=3

Scaling down removes excess containers gracefully. Swarm selects which specific containers to stop, typically preferring to balance the cluster by removing containers from nodes with the most replicas. Containers are stopped gracefully, giving them time to finish processing requests. This is essential during low-traffic periods when you want to conserve resources.

Scale multiple services:

docker service scale web=5 api=8 worker=12

You can scale multiple services simultaneously in a single command. This is particularly useful when you need to scale an entire tier of your application. For example, if you're expecting a traffic surge, you might scale web, API, and background worker services all at once. Swarm processes these scales in parallel, quickly adjusting your entire application capacity.

9. Checking Cluster Status

Check nodes:

docker node ls

This command shows all nodes in your cluster, displaying each node's ID, hostname, whether it's a manager or worker, its availability status (active, drain, or pause), manager status (reachable, leader, or unreachable), and the Docker Engine version running on it. The leader designation shows which manager is currently making scheduling decisions, though all managers can accept commands. This is your first stop when diagnosing cluster issues or verifying new nodes joined successfully.

Check services:

docker service ls

This displays all services deployed across your cluster, showing their names, mode (replicated or global), current replica count versus desired count, the image being used, and published ports. The replica count is particularly important because 3/3 means healthy, but 2/3 indicates one replica is missing, possibly due to scheduling issues or node failures.

Check tasks for a service:

docker service ps web

Tasks are the individual containers that make up a service. This command shows every task's ID, the name including replica number, the image, the node it's running on, the desired state, the current state, error messages if any, and ports. Historical failed tasks are also shown, which is invaluable for diagnosing why containers keep crashing. You might see tasks in states like Running, Failed, Shutdown, or Starting.

Inspect service details:

docker service inspect web --pretty

This provides comprehensive information about the service configuration, including the full image name with digest, replica count and mode, update configuration, restart policies, network attachments, secrets and configs, resource limits, placement constraints, and more. The --pretty flag formats this as human-readable text rather than JSON, making it easier to review quickly. Without --pretty, you get JSON output suitable for parsing with tools like jq.

View service logs:

docker service logs -f web

This aggregates logs from all replicas of the web service into a single stream. The -f flag follows the logs, showing new entries in real-time. Each log line is prefixed with the task name, helping you identify which replica generated which message. This is extremely useful for debugging distributed services where you need to see what's happening across all replicas simultaneously.

Node details:

docker node inspect worker1 --pretty

This shows detailed information about a specific node, including its hostname, IP addresses, resources (total and available CPU/memory), labels applied to the node, its role and availability, TLS certificate information, and Docker Engine version. You can add or modify labels with docker node update --label-add key=value worker1, which is useful for placement constraints.

10. Basic Load Balancing

Swarm provides two types of load balancing automatically without requiring external load balancers.

Routing Mesh is Swarm's ingress load balancing mechanism. When you publish a port on a service, that port becomes available on every node in the cluster, regardless of which nodes actually run containers for that service. When traffic arrives at any node on the published port, the routing mesh forwards it to one of the service's containers, using IPVS for efficient load balancing. This means users can connect to any node's IP address and automatically reach your service.

Internal LB uses Virtual IP (VIP) for service discovery within the cluster. When containers on the same network need to communicate, they use service names as hostnames. Swarm resolves each service name to a single virtual IP, which then load balances requests across all replicas of that service. This provides transparent load balancing without requiring applications to know about individual container IPs.

For example, if you have an nginx service running on worker1 and worker3, but a user connects to worker2 on the published port, Swarm's routing mesh automatically forwards that connection to either worker1 or worker3 (whichever nginx container is selected by the load balancer). The user doesn't need to know which nodes run nginx—any node works. This dramatically simplifies client configuration and provides fault tolerance because if one node fails, users can connect to any other node.

Publishing ports:

docker service create --name web --publish 8080:80 nginx

This publishes the container's port 80 on port 8080 of every node in the cluster. Users can access your nginx service by connecting to any-node-ip:8080, and they'll be automatically routed to one of the nginx containers. The routing mesh distributes connections across all nginx replicas using round-robin by default.

Port publishing modes:

Ingress mode (default) is what we just described—load balanced across all nodes. This is what you want for most services because it provides high availability and easy client access. If any node fails, clients can simply connect to another node.

docker service create --name web --publish mode=host,target=80 nginx

Host mode bypasses the routing mesh and binds the port only on nodes actually running the service. This is useful for services that need direct network access or very high performance without the routing mesh overhead. However, clients must know specifically which nodes run the service. Host mode is common for monitoring exporters or services that need low latency.

11. Overlay Networks

Overlay networks enable containers running on different nodes to communicate with each other as if they were on the same local network. They use VXLAN (Virtual Extensible LAN) encapsulation to create layer 2 network segments over existing layer 3 infrastructure. This means containers can use simple IP communication even though they're actually on different physical networks separated by routers.

Create network:

docker network create -d overlay mynet

This creates a new overlay network named mynet using the overlay driver. By default, the network is not encrypted, which offers better performance. Containers attached to this network can communicate using their container names or service names as hostnames. Swarm automatically handles all the DNS resolution and routing. Services can join this network by specifying it in their stack file or with the -- network flag.

Create with encryption:

docker network create -d overlay --opt encrypted mynet

Adding the encrypted option enables IPsec encryption for all traffic on this network. Every packet sent between containers on different nodes is encrypted transparently. This provides defense-in-depth security, protecting your data even if the underlying network is compromised. There's a small performance overhead for encryption, but it's typically negligible compared to application processing time. Encrypted networks are essential when your nodes communicate over untrusted networks like the public internet.

Use in stack:

networks:
  backend:
    driver: overlay
    driver_opts:
      encrypted: "true"
    attachable: true

This stack file definition creates an encrypted overlay network named backend. The attachable: true option allows standalone containers (not managed by Swarm) to attach to this network using docker run --network backend. Without attachable, only Swarm services can use the network. This is useful during development when you want to attach debugging containers to production networks.

Inspect overlay network:

docker network inspect mynet

This shows detailed information about the network, including its subnet and gateway, which containers are connected, VXLAN identifiers used for encapsulation, whether encryption is enabled, and driver options. You can see every container currently attached to the network with their IP addresses. This is helpful when troubleshooting connectivity issues between services.

12. Basic Rolling Updates

Rolling updates allow you to update services without downtime by gradually replacing containers with new versions.

Update service image:

docker service update --image nginx:latest web

This updates the web service to use the nginx:latest image. Swarm pulls the new image to relevant nodes, then starts replacing containers one at a time by default. Each new container starts and passes health checks before Swarm stops the old container, ensuring continuous availability. The update continues until all replicas run the new version. If any new container fails to start or fails health checks, Swarm can automatically roll back.

Update with zero downtime:

docker service update \
  --image nginx:1.21 \
  --update-parallelism 2 \
  --update-delay 10s \
  web

This update demonstrates fine-grained control over the update process. The --update-parallelism 2 means Swarm updates 2 containers simultaneously rather than one at a time, speeding up the rollout while still maintaining most of your capacity. The --update-delay 10s adds a 10-second pause between batches, giving you time to observe the new version's behavior before proceeding. This is crucial for catching issues early before the entire service is updated.

For a service with 10 replicas, this would update 2 containers, wait 10 seconds, update 2 more, wait 10 seconds, and so on. If users are connected to the 2 being updated, they're automatically routed to the 8 still running the old version, maintaining service availability.

Rollback to previous version:

docker service rollback web

If you discover issues after an update, rollback immediately restores the previous configuration. Swarm remembers the previous service definition and reverses all changes made during the last update. This includes the image version, environment variables, configuration changes, and resource limits. Rollback uses the same rolling update mechanism, so it happens gradually without downtime.

Configure update strategy in stack:

deploy:
  update_config:
    parallelism: 2
    delay: 10s
    failure_action: rollback
    monitor: 30s
    max_failure_ratio: 0.3

This update configuration provides comprehensive control over update behavior. The parallelism: 2 updates 2 containers at a time. The delay: 10s pauses between batches. The failure_action: rollback tells Swarm to automatically roll back if too many containers fail. The monitor: 30s means Swarm watches each new container for 30 seconds after starting to ensure it remains healthy. The max_failure_ratio: 0.3 allows up to 30% of updates to fail before triggering a rollback.

13. Swarm Internal Architecture (Raft, Managers, Workers)

Understanding Swarm's internal architecture helps you design reliable clusters and troubleshoot issues effectively.

Raft Consensus:

Raft is a consensus algorithm that ensures all manager nodes agree on the cluster state. It solves the distributed systems problem of multiple servers needing to agree on data despite network partitions and node failures. Raft elects one manager as the leader, which makes all scheduling decisions and processes updates. Other managers are followers that replicate the leader's decisions.

If the leader fails, Raft automatically elects a new leader within seconds through a voting process. All managers participate in elections, and the manager with the most up-to-date log wins. This ensures continuous cluster operation even during manager failures. The leader election typically completes in under 10 seconds.

Swarm stores all cluster state in the Raft log, including service definitions with all their configuration, network configurations and IP address allocations, secrets encrypted with the cluster's encryption keys, node membership and roles, and task assignments mapping containers to nodes. This replicated log ensures every manager has a complete copy of cluster state and can become leader if needed.

Raft requires a quorum (majority) of managers to be reachable for the cluster to function. With 3 managers, you need 2 available. With 5 managers, you need 3 available. This is why odd numbers are recommended—4 managers have the same fault tolerance as 3 (both tolerate 1 failure) but 4 have more overhead.

Communication:

Managers use port 2377 for cluster management traffic, including Raft consensus messages, task scheduling, and cluster state updates. This port must be open between all manager nodes and from workers to managers. It uses mutual TLS automatically, encrypting and authenticating all traffic.

Workers contact managers using heartbeat messages every few seconds, reporting their health, available resources, and running containers. If a worker misses several heartbeats, managers mark it as down and reschedule its workloads.

Port 7946 (both TCP and UDP) handles container network discovery and overlay network management. Containers use this for peer discovery when joining overlay networks. Port 4789 (UDP) carries actual overlay network traffic using VXLAN encapsulation. All application data between containers on overlay networks flows through this port.

Data stored in Raft:

Every service definition is stored in Raft, including the image, replica count, networks, secrets, configs, update strategies, and all other configuration. When you run docker service create, the manager writes this to the Raft log, which replicates to all managers. Network configurations store subnet allocations, VXLAN IDs, and which services attach to which networks. Secrets are stored encrypted in the Raft log, with the encryption key derived from the cluster unlock key. Node membership tracks which nodes are in the cluster, their roles, availability, and labels. Task assignments record which containers should run on which nodes.

14. Advanced Service Scheduling (Constraints, Preferences)

Swarm's scheduler decides where to place containers based on resources, constraints, and preferences you specify.

Node Constraints

Constraints are hard requirements that must be met for a container to run on a node. If no nodes match all constraints, the task remains in a pending state until a suitable node becomes available.

deploy:
  placement:
    constraints:
      - "node.labels.storage == ssd"
      - "node.role == worker"

This constraint configuration ensures the service only runs on worker nodes that have been labeled with storage=ssd. The first constraint checks for a custom label you've applied to nodes with SSD storage, while the second ensures the service avoids manager nodes. This is useful for database services or applications that need high-performance storage. If you have 5 nodes but only 2 have SSD storage and are workers, Swarm only places containers on those 2 nodes.

Label a node:

docker node update --label-add storage=ssd worker1

This command adds a custom label to worker1. Labels are arbitrary key-value pairs you assign to categorize nodes. You might label nodes based on hardware (storage=ssd, gpu=nvidia), location (datacenter=east, zone=us-east-1a), or purpose (environment=production, tier=frontend). Once labeled, you can use these labels in placement constraints to control exactly where services run.

Placement Preferences

While constraints are hard requirements, preferences are soft guidelines that influence scheduling decisions without preventing placement.

deploy:
  placement:
    preferences:
      - spread: node.labels.zone

This preference tells Swarm to spread replicas evenly across different availability zones. If you have 6 replicas and nodes in 3 zones, Swarm tries to place 2 replicas in each zone. This provides fault tolerance against zone-level failures. If an entire zone goes offline, you still have replicas in the other zones. Preferences don't prevent scheduling—if one zone is full, Swarm places additional replicas in zones with capacity.

Example with multiple constraints:

deploy:
  replicas: 6
  placement:
    constraints:
      - "node.role == worker"
      - "node.labels.environment == production"
      - "node.labels.storage == ssd"
    preferences:
      - spread: node.labels.datacenter

This configuration combines multiple constraints and a preference for a production database service. All three constraints must be satisfied: the node must be a worker, labeled as production environment, and have SSD storage. Only nodes meeting all three criteria are considered. Among those qualifying nodes, Swarm spreads the 6 replicas across different datacenters according to the preference. This ensures high performance (SSD), production isolation (environment label), and geographic distribution (datacenter spread).

15. Overlay Networking Deep-Dive (VXLAN, Routing Mesh, Ingress)

VXLAN (Virtual Extensible LAN):

VXLAN creates layer 2 overlay networks on top of existing layer 3 infrastructure. It encapsulates ethernet frames inside UDP packets, allowing containers on different physical networks to communicate as if they were on the same local network. Each overlay network gets a unique VXLAN ID (VNI), and Swarm handles all the encapsulation transparently.

When a container on node1 sends a packet to a container on node2, Swarm's VXLAN driver wraps the original ethernet frame in a UDP packet addressed to node2. Node2 receives the UDP packet, unwraps it, and delivers the original frame to the destination container. This happens entirely transparently to the containers, which see normal ethernet connectivity.

Ingress Network:

The ingress network is a special overlay network created automatically when you initialize a Swarm. It handles all published service ports and implements the routing mesh. When you publish port 8080 on a service, the ingress network makes that port available on every node, forwarding incoming connections to appropriate containers regardless of which node they're running on.

The ingress network uses IPVS (IP Virtual Server) for high-performance load balancing. IPVS operates in the Linux kernel, providing much better performance than userspace proxies. It supports multiple load balancing algorithms, with round-robin being the default.

Routing Mesh:

The routing mesh is Swarm's ingenious solution to external load balancing. Traditional load balancers require knowledge of which specific servers run your application. Swarm's routing mesh eliminates this requirement by making your published ports available on every node. Users can connect to any node's IP address, and the routing mesh forwards their connection to a healthy container, regardless of that container's location.

This works through a combination of IPVS load balancing and overlay networking. When traffic arrives at a node's published port, the ingress network's IPVS load balancer selects one of the service's containers. If the selected container runs on a different node, the traffic is forwarded through the VXLAN overlay to that node. This forwarding is fast because it happens at the kernel level.

Custom ingress network:

docker network create \
  --driver overlay \
  --ingress \
  --subnet=10.11.0.0/16 \
  --gateway=10.11.0.1 \
  my-ingress

You can replace the default ingress network with a custom one to control its IP addressing. This is useful when the default subnet conflicts with your existing network infrastructure. Before creating a custom ingress network, you must first remove the default one by ensuring no services are published, then running docker network rm ingress. The --ingress flag designates this network as the special ingress network that handles published ports.

16. Swarm Secrets & Configs

Secrets

Secrets provide secure storage for sensitive data like passwords, API keys, and certificates. Unlike environment variables that appear in container inspections and potentially in logs, secrets are never stored in images, never exposed in plain text except inside the container's memory filesystem, and are encrypted both in transit and at rest.

Add a secret:

echo "mypassword" | docker secret create db-pass -

This creates a secret named db-pass from stdin. The password is immediately encrypted and stored in the Raft log on managers. The - tells Docker to read from stdin rather than a file. This is convenient for one-off secrets but means the password might appear in your shell history. For production, using files is more secure.

From file:

docker secret create db-pass ./password.txt

Creating secrets from files is more secure for production workflows. The file's contents are read and encrypted, then the file can be securely deleted. This prevents the secret from appearing in shell history or command logs. You might generate the file programmatically as part of your deployment pipeline, use it to create the secret, then immediately remove it.

Use in service:

services:
  api:
    image: myapi
    secrets:
      - db-pass
      - api-key

secrets:
  db-pass:
    external: true
  api-key:
    file: ./api-key.txt

When a service uses secrets, Swarm mounts them as files inside the container at /run/secrets/<secret-name>. The api service would have files /run/secrets/db-pass and /run/secrets/api-key available. Applications read these files at startup to retrieve credentials. The external: true flag indicates the secret already exists in Swarm and should not be created from a file. The api-key secret will be created from api-key.txt if it doesn't exist.

Secrets provide several security benefits. They're encrypted in transit using Swarm's mutual TLS between nodes. They're encrypted at rest in the Raft log using the cluster's unlock key. They're only ever decrypted and made available to containers that explicitly request them. They're never written to disk in plain text—they exist only in memory inside containers. When a container stops, the secret is immediately removed from that node's memory.

Rotate secrets:

echo "newpassword" | docker secret create db-pass-v2 -
docker service update --secret-rm db-pass --secret-add db-pass-v2 myservice
docker secret rm db-pass

Secret rotation is critical for security compliance. This process creates a new secret with the updated password, updates the service to use the new secret while removing the old one, then deletes the old secret. The service update triggers a rolling restart of containers, so they pick up the new secret. Applications that monitor the secret file for changes can reload without restart. This entire process can happen without downtime if your application handles secret rotation gracefully.

Configs

Configs are similar to secrets but designed for non-sensitive configuration data like application config files, nginx configurations, or other settings that don't need encryption.

Create config:

docker config create nginx-config nginx.conf

This creates a config from the nginx.conf file. Configs are stored in the Raft log like secrets, but they're not encrypted since they contain non-sensitive data. They're versioned and immutable—once created, a config never changes. To update configuration, you create a new config version and update services to use it.

Use in service:

configs:
  - source: nginx-config
    target: /etc/nginx/nginx.conf
    mode: 0440

Configs are mounted as files inside containers at the path specified by target. The mode parameter sets file permissions (in this case, readable by owner and group, not writable). When the nginx container starts, it finds its configuration file already in place at /etc/nginx/nginx.conf, populated from the Swarm config. This eliminates the need to build custom images for each configuration variant.

17. Zero-Downtime Rolling Updates

Enable automatic rollback:

deploy:
  update_config:
    parallelism: 2
    delay: 10s
    failure_action: rollback
    monitor: 60s
    max_failure_ratio: 0.3
    order: stop-first

This comprehensive update configuration ensures zero downtime deployments with automatic failure recovery. The parallelism: 2 updates 2 containers simultaneously, balancing speed with safety. The delay: 10s provides observation time between batches to detect issues early.

The failure_action: rollback is critical for automated recovery. If too many containers fail during the update, Swarm automatically reverses all changes, restoring the previous working version. The monitor: 60s means Swarm observes each new container for 60 seconds after starting. If the container crashes or fails health checks during this window, it's counted as a failure.

The max_failure_ratio: 0.3 allows up to 30% of updates to fail before triggering rollback. With 10 replicas, 3 failures are tolerated. The 4th failure triggers immediate rollback of all changes. This prevents bad deployments from affecting your entire service.

The order: stop-first means old containers stop before new ones start, which uses less resources during updates. The alternative start-first starts new containers before stopping old ones, ensuring extra capacity during updates but requiring more cluster resources.

Health checks ensure zero downtime:

services:
  web:
    image: nginx
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

Health checks are essential for zero-downtime updates because they prevent Swarm from routing traffic to containers that aren't ready or are malfunctioning. The test command runs inside the container to verify it's healthy. This example uses curl to check an HTTP health endpoint.

The interval: 30s means the health check runs every 30 seconds after the container is healthy. The timeout: 10s limits how long each health check can run before being considered failed. The retries: 3 means 3 consecutive failures are required before marking the container unhealthy. The start_period: 40s gives the container 40 seconds to start up before health checks count as failures, accommodating slow application startup.

During rolling updates, Swarm waits for each new container to pass health checks before considering it ready. Only after a new container is healthy does Swarm stop an old container and move to the next batch. This ensures your service always has healthy containers serving traffic.

Manual rollback:

docker service rollback web

If you discover issues after an update completes, manual rollback provides immediate recovery. Swarm stores the previous service specification, including the image tag, environment variables, secrets, configs, and all other settings. Rollback restores everything to the pre-update state using the same rolling update mechanism, ensuring continuous availability during rollback.

18. Scaling Strategies (Metric-Based Autoscaling)

Docker Swarm doesn't include built-in autoscaling, but you can implement it using external tools that monitor metrics and adjust replica counts.

Example with Prometheus:

# prometheus.yml
scrape_configs:
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

groups:
  - name: container_alerts
    rules:
      - alert: HighCPU
        expr: rate(container_cpu_usage_seconds_total[1m]) > 0.8
        for: 2m

This Prometheus configuration scrapes metrics from cAdvisor, which exposes container CPU, memory, and network statistics. The alert rule triggers when CPU usage exceeds 80% for 2 minutes. Prometheus can send these alerts to Alertmanager, which can trigger webhooks that call scaling scripts. This provides metric-based autoscaling where your cluster automatically scales services based on actual resource utilization rather than time-based schedules.

Simple script-based scaling:

#!/bin/bash
CURRENT=$(docker service inspect --format='{{.Spec.Mode.Replicated.Replicas}}' api)
CPU=$(docker stats --no-stream --format "{{.CPUPerc}}" | sed 's/%//')

if (( $(echo "$CPU > 80" | bc -l) )); then
  NEW=$((CURRENT + 2))
  docker service scale api=$NEW
fi

This bash script demonstrates basic autoscaling logic. It queries the current replica count for the api service, measures CPU usage across all containers, and if CPU exceeds 80%, scales up by 2 replicas. You'd run this script periodically via cron or a similar scheduler. More sophisticated implementations would include scale-down logic, rate limiting to prevent scaling storms, and integration with proper monitoring systems. Production autoscaling typically uses tools like Docker AutoScaler or custom applications that integrate with Prometheus and the Docker API to make scaling decisions based on comprehensive metrics.

19. High Availability (Manager Quorum, Anti-Affinity)

Manager quorum rule:

High availability in Swarm depends on maintaining manager quorum. With 1 manager, any failure brings down the entire cluster because no manager survives to schedule work. With 3 managers, 1 can fail and the remaining 2 maintain quorum, keeping the cluster operational. With 5 managers, 2 can fail simultaneously and the remaining 3 maintain quorum. With 7 managers, 3 can fail and the cluster survives.

The formula for quorum is (N/2) + 1, where N is the total number of managers. Always round down the division. With 5 managers, (5/2) + 1 = 2 + 1 = 3 managers needed for quorum. If you lose 3 managers, only 2 remain, which is less than the required 3, so the cluster stops functioning.

Odd numbers are recommended because even numbers don't improve fault tolerance. 4 managers tolerate 1 failure (same as 3 managers) but have more overhead. 6 managers tolerate 2 failures (same as 5) but add complexity. The extra manager in even-numbered configurations provides no additional fault tolerance but increases the network and CPU overhead of Raft consensus.

Don't exceed 7 managers because Raft's performance degrades with more participants. Every scheduling decision, every cluster state change must be replicated to all managers. With 9 or 11 managers, the time to reach consensus increases significantly, and network bandwidth consumption grows. For very large clusters, 7 managers provides the best balance of fault tolerance (3 failures) and performance.

Best practices:

Distribute managers across availability zones or datacenters to survive zone-level failures. If all 3 managers run in one datacenter and that datacenter loses power or network connectivity, your entire cluster stops functioning even though workers in other datacenters are still running. Spreading managers across 3 different zones means any single zone can fail without losing quorum.

Don't exceed 7 managers because the performance impact becomes significant. Each additional manager increases the time required for consensus, uses more network bandwidth, and adds CPU overhead for maintaining the Raft log. The scaling benefits of more managers are vastly outweighed by the performance costs.

Anti-affinity example:

deploy:
  replicas: 3
  placement:
    constraints:
      - "node.role == worker"
    preferences:
      - spread: node.id
    max_replicas_per_node: 1

This configuration implements anti-affinity to spread replicas across different nodes. The spread: node.id preference distributes replicas across unique nodes as much as possible. The max_replicas_per_node: 1 hard limit prevents multiple replicas from ever running on the same node, ensuring maximum fault tolerance. If you have 3 replicas and 3 workers, each worker gets exactly 1 replica. If a worker fails, only 1 replica is lost, and the other 2 continue serving traffic. This is critical for highly available services where you want to survive individual node failures.

Spread across zones:

deploy:
  placement:
    preferences:
      - spread: node.labels.zone

Geographic distribution protects against zone-level failures. If you label nodes with their availability zone (zone=us-east-1a, zone=us-east-1b, zone=us-east-1c) and use this spread preference, Swarm distributes replicas evenly across zones. With 9 replicas across 3 zones, you get 3 replicas per zone. If an entire zone loses power or network connectivity, 6 replicas in the other 2 zones continue serving traffic with no interruption.

20. Persistent Storage in Swarm

Swarm's built-in volumes are local to each node, which creates challenges for stateful applications. If a database container runs on node1 with a local volume, and node1 fails, Swarm reschedules the container to node2, but the data remains on node1's disk, inaccessible to the rescheduled container. This is why Swarm requires external storage solutions for production stateful applications.

Solutions:

Portworx is a container-native storage platform that provides highly available, persistent volumes across the cluster. It replicates data across multiple nodes, so if any node fails, data remains accessible from other nodes. Containers can be rescheduled to any node and reconnect to their persistent volumes automatically.

GlusterFS is a distributed file system that aggregates storage from multiple nodes into a single global namespace. Volumes on GlusterFS are accessible from any node in the cluster, allowing containers to be rescheduled freely while maintaining access to their data.

NFS (Network File System) is a simple and widely supported network storage protocol. You set up an NFS server with persistent storage, then mount NFS shares as volumes in your containers. All nodes can access the same NFS share, so containers can move between nodes without losing data access.

Longhorn is a cloud-native distributed block storage system specifically designed for Kubernetes and Docker Swarm. It provides features like snapshots, backups, and disaster recovery while integrating cleanly with container orchestrators.

REX-Ray is a storage orchestration engine that provides a common interface to various storage platforms including AWS EBS, Azure Disk, and on-premises storage arrays.

Example with NFS:

volumes:
  data:
    driver: local
    driver_opts:
      type: nfs
      o: addr=192.168.1.100,rw
      device: ":/path/to/share"

This volume definition mounts an NFS share from server 192.168.1.100. The o: addr=192.168.1.100,rw specifies the NFS server address and mount options (read-write). The device specifies the export path on the NFS server. Any container using this volume will have the same data regardless of which node it runs on, because all nodes mount the same NFS share. This enables database containers to be rescheduled to different nodes while retaining access to their data.

Example with Portworx:

volumes:
  db-data:
    driver: pxd
    driver_opts:
      repl: "3"
      size: "10G"

This Portworx volume is replicated 3 times across different nodes for high availability. If any node fails, the data remains accessible from the other 2 replicas. The size parameter allocates 10GB of storage. Portworx handles all the complexity of data replication, ensuring writes are synchronized across replicas and containers always get consistent data regardless of which node they run on.

Constraint for stateful services:

deploy:
  placement:
    constraints:
      - "node.hostname == db-server"

As an alternative to distributed storage, you can pin stateful services to specific nodes. This constraint ensures the database always runs on db-server, which has local persistent storage. This is simpler than distributed storage but eliminates the ability to reschedule the database if db-server fails. It's acceptable for development or when you have manual failover procedures, but production systems should use proper distributed storage for true high availability.

21. Monitoring & Logging

Monitoring Stack

Prometheus + Grafana + cAdvisor:

services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
    ports:
      - 9090:9090
    deploy:
      placement:
        constraints:
          - node.role == manager

  grafana:
    image: grafana/grafana
    ports:
      - 3000:3000
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

  cadvisor:
    image: google/cadvisor
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    deploy:
      mode: global

This monitoring stack provides comprehensive visibility into your Swarm cluster. Prometheus collects and stores time-series metrics from all services. It runs on a manager node to ensure it remains available, as managers are typically more stable. The prometheus-data volume persists metrics even if the container restarts.

Grafana provides visualization dashboards for Prometheus metrics. You access it on port 3000 with the admin password specified. Grafana can display CPU usage graphs, memory consumption trends, network traffic, and container counts, helping you understand cluster behavior and identify performance issues.

cAdvisor runs as a global service, meaning one instance on every node. It collects resource usage statistics for all containers on that node and exposes them for Prometheus to scrape. The volume mounts give cAdvisor read-only access to system information needed to collect metrics. With cAdvisor on every node, Prometheus gets complete visibility into every container in your cluster.

This stack together provides the observability needed for production operations. You can set up alerts in Prometheus to notify you when CPU usage is high, memory is low, or containers are crashing frequently. Grafana dashboards give you real-time and historical views of cluster health.

Logging Solutions

GELF logging driver:

services:
  web:
    image: nginx
    logging:
      driver: gelf
      options:
        gelf-address: "udp://10.0.0.10:12201"
        tag: "web"

The GELF (Graylog Extended Log Format) logging driver sends container logs to a central Graylog server at 10.0.0.10:12201. Instead of logs remaining scattered across individual nodes, they're centralized for easy searching and analysis. The tag "web" helps identify which service generated each log entry. Centralized logging is essential for production environments where you need to troubleshoot issues across multiple containers and nodes without SSHing into individual servers.

JSON file with log rotation:

logging:
  driver: json-file
  options:
    max-size: "10m"
    max-file: "3"

The default json-file logging driver stores logs locally on each node, but without rotation, logs grow indefinitely and can fill disks. This configuration limits each log file to 10MB and keeps only 3 files, so maximum disk usage per container is 30MB. When the current file reaches 10MB, it's rotated to a backup file, and the oldest backup is deleted. This prevents log disk space exhaustion while retaining recent logs for troubleshooting.

ELK Stack integration:

services:
  app:
    image: myapp
    logging:
      driver: fluentd
      options:
        fluentd-address: localhost:24224
        tag: app.{{.Name}}

Fluentd is a popular log collector that can forward logs to Elasticsearch for indexing and Kibana for visualization (the ELK stack). The tag includes the container name, making it easy to filter logs by specific containers. Fluentd provides powerful log processing capabilities like parsing, filtering, and enriching logs before sending them to Elasticsearch. This is valuable for complex production environments where you need structured logging with rich querying capabilities.

22. Security Hardening

Mutual TLS (mTLS)

Swarm automatically implements mutual TLS for all cluster communication without any configuration required. When a node joins the cluster, it receives a certificate signed by the cluster's certificate authority. All subsequent communication between nodes is encrypted using these certificates, and both parties authenticate each other on every connection.

Certificates automatically rotate before expiration, eliminating manual certificate management. The default expiration is 90 days, but you can adjust this based on your security requirements.

Certificate rotation:

docker swarm update --cert-expiry 48h

This changes the certificate expiration period to 48 hours. Shorter expiration reduces the window of opportunity if a certificate is compromised, but increases the frequency of rotation. For highly secure environments, daily or weekly rotation is common. Swarm handles rotation automatically, obtaining new certificates before old ones expire, so services experience no interruption.

View certificates:

docker system info | grep "Swarm"

This displays Swarm-related information including certificate details, cluster ID, number of managers and nodes, and Raft status. You can verify that certificates are being rotated and check when the next rotation occurs.

Security Options

Enable seccomp profile:

deploy:
  security_opt:
    - seccomp=/path/to/profile.json

Seccomp (Secure Computing Mode) restricts which system calls containers can make. A seccomp profile lists allowed system calls; any attempt to make unlisted calls is blocked. This limits the attack surface if a container is compromised. For example, you can prevent containers from making network-related system calls if they shouldn't need network access, or block file system modifications for read-only services.

AppArmor profile:

deploy:
  security_opt:
    - apparmor=docker-default

AppArmor is a Linux Security Module that confines programs to limited resources. The docker-default profile restricts containers from accessing sensitive system areas. You can create custom AppArmor profiles that further restrict what containers can do, such as preventing access to specific directories or limiting network capabilities.

Read-only root filesystem:

services:
  web:
    image: nginx
    read_only: true
    tmpfs:
      - /tmp
      - /var/run

Running containers with read-only root filesystems prevents attackers from modifying binaries or installing malicious software if they compromise a container. The application can still write to designated tmpfs mounts (in-memory filesystems) for temporary files. This dramatically limits what an attacker can do even if they gain access to a container. Nginx needs to write temporary files and PID files, so we provide writable tmpfs mounts for /tmp and /var/run while keeping everything else read-only.

Drop capabilities:

services:
  web:
    image: nginx
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE

Linux capabilities divide root privileges into distinct units. By default, containers have many capabilities they don't need. This configuration drops all capabilities, then adds back only NET_BIND_SERVICE, which allows binding to privileged ports (ports below 1024). Nginx needs this to listen on port 80, but doesn't need capabilities like SYS_ADMIN, SYS_MODULE, or CAP_NET_RAW. Dropping unnecessary capabilities limits what attackers can do if they compromise the container.

Network Security

Encrypt overlay networks:

docker network create --opt encrypted --driver overlay secure-net

While Swarm's control plane uses mTLS automatically, overlay network data plane traffic is unencrypted by default for performance. Adding encryption protects application data in transit between containers. This is essential when containers communicate across untrusted networks or when compliance requires all data to be encrypted. The performance overhead is typically negligible for most applications.

Isolate networks:

networks:
  frontend:
    driver: overlay
  backend:
    driver: overlay
    internal: true

Network isolation is fundamental to defense in depth. The frontend network allows external access for web servers. The backend network is marked internal, meaning containers on it cannot communicate with external networks or receive traffic from outside the cluster. Your database on the backend network can only be reached by application servers on the same network, never directly from the internet. This limits blast radius if the frontend is compromised.

22. Security Hardening

Mutual TLS (mTLS)

Certificates automatically rotate before expiration, eliminating manual certificate management. The default expiration is 90 days, but you can adjust this based on your security requirements.

Certificate rotation:

docker swarm update --cert-expiry 48h

View certificates:

docker system info | grep "Swarm"

Security Options

Enable seccomp profile:

deploy:
  security_opt:
    - seccomp=/path/to/profile.json

AppArmor profile:

deploy:
  security_opt:
    - apparmor=docker-default

Read-only root filesystem:

services:
  web:
    image: nginx
    read_only: true
    tmpfs:
      - /tmp
      - /var/run

Drop capabilities:

services:
  web:
    image: nginx
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE

Network Security

Encrypt overlay networks:

docker network create --opt encrypted --driver overlay secure-net

Isolate networks:

networks:
  frontend:
    driver: overlay
  backend:
    driver: overlay
    internal: true

23. Node Lifecycle Management

Drain a node:

docker node update --availability drain worker2

Draining prevents new tasks from being scheduled on the node and gracefully stops all existing tasks, rescheduling them to other nodes. This is essential before performing maintenance like kernel updates, hardware replacement, or Docker Engine upgrades. Tasks stop gracefully, respecting their stop grace periods, then Swarm schedules replacement tasks on healthy nodes. The node remains in the cluster but doesn't receive work.

Pause a node:

docker node update --availability pause worker2

Pausing stops new task scheduling but leaves existing tasks running. This is useful when you need to investigate issues on a node without disrupting currently running workloads. New tasks go to other nodes, but whatever's already running continues normally. Pause is less disruptive than drain but still prevents the node from receiving additional work.

Activate a node:

docker node update --availability active worker2

Activating returns a drained or paused node to full service. The node begins accepting new task assignments immediately. Existing tasks that were rescheduled during drain don't automatically return— Swarm doesn't rebalance automatically. If you want tasks to return to the activated node, you'd need to manually rebalance by updating services or forcing redeployment.

Promote/demote nodes:

docker node promote worker2
docker node demote manager3

Promotion adds a worker to the manager Raft consensus group, increasing cluster fault tolerance. You might promote before performing maintenance on other managers to ensure quorum is maintained. Demotion removes a manager from Raft, converting it to a worker. You might demote before decommissioning a manager or when reducing cluster size.

Remove node:

docker node update --availability drain worker5
docker node ps worker5
docker node rm worker5

This is the proper sequence for removing a node. First drain it to move all workloads to other nodes. Then verify with docker node ps that all tasks have stopped and been rescheduled. Finally remove the node from cluster membership. The node must be down or in drain status to be removed. If a node is unexpectedly lost, you can force remove it with docker node rm --force.

Maintenance workflow:

# 1. Drain node
docker node update --availability drain worker1

# 2. Verify tasks migrated
docker node ps worker1

# 3. Perform maintenance
ssh worker1 "systemctl restart docker"

# 4. Activate node
docker node update --availability active worker1

This workflow ensures zero-downtime maintenance. Draining moves workloads away before maintenance begins. Verifying ensures all tasks successfully rescheduled before proceeding. The actual maintenance (restarting Docker, applying patches, replacing hardware) happens while the node serves no traffic. Activation returns the node to service. Following this pattern, you can maintain every node in your cluster without service interruption.

24. Production-Grade Load Balancing Patterns

Internal Load Balancing (VIP Mode)

services:
  web:
    image: nginx
    deploy:
      replicas: 3
      endpoint_mode: vip

VIP (Virtual IP) mode is Swarm's default and most common load balancing method. Swarm assigns a single virtual IP address to the service, and DNS resolves the service name to this VIP. When containers connect to the service name, they reach the VIP, which load balances requests across all healthy replicas using IPVS in the kernel. This provides high-performance load balancing without requiring external load balancers. The VIP doesn't change even as replicas are added, removed, or rescheduled, providing stable service discovery.

DNS Round Robin (DNSRR Mode)

services:
  web:
    image: nginx
    deploy:
      replicas: 3
      endpoint_mode: dnsrr

DNSRR mode returns multiple IP addresses (one for each replica) when a service name is resolved via DNS. The client's DNS resolver typically picks one randomly or rotates through them. This mode is useful for clients that do their own load balancing or connection pooling. However, it's less robust than VIP mode because DNS results are cached, so if a replica fails, clients might still try to connect to its old IP until DNS caches expire. DNSRR also doesn't provide health-based load balancing—clients might connect to unhealthy replicas before discovering they're down.

External Load Balancers

Traefik (Cloud-Native):

services:
  traefik:
    image: traefik:v2.10
    command:
      - "--api.insecure=true"
      - "--providers.docker.swarmMode=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
    ports:
      - "80:80"
      - "8080:8080"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      placement:
        constraints:
          - node.role == manager

  app:
    image: myapp
    deploy:
      replicas: 5
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.app.rule=Host(`app.example.com`)"
        - "traefik.http.services.app.loadbalancer.server.port=8080"

Traefik is a modern cloud-native edge router that integrates deeply with Swarm. It automatically discovers services by watching the Docker API and configures routes based on labels. The -- providers.docker.swarmMode=true flag tells Traefik to understand Swarm services rather than standalone containers.

In this configuration, Traefik runs on a manager node where it can access the Docker socket to monitor cluster state. The app service uses labels to configure routing. The traefik.enable=true label opts the service into Traefik's routing (since we disabled exposure by default). The router rule Host(\app.example.com`)` means requests with that hostname are routed to this service. Traefik automatically load balances across all 5 replicas, performs health checks, and removes unhealthy replicas from its pool.

Traefik provides features like automatic HTTPS with Let's Encrypt, request tracing, circuit breakers, rate limiting, and sophisticated routing rules. It's particularly valuable in microservices architectures where you have many services needing external access with different domains or paths.

HAProxy:

services:
  haproxy:
    image: haproxy:2.8
    ports:
      - "80:80"
      - "443:443"
    configs:
      - source: haproxy-config
        target: /usr/local/etc/haproxy/haproxy.cfg
    deploy:
      replicas: 2

HAProxy is a battle-tested, high-performance TCP/HTTP load balancer. Running 2 replicas provides load balancer redundancy—if one fails, the other continues serving traffic. The configuration comes from a Swarm config, allowing you to update load balancer settings by creating new configs and updating the service.

HAProxy configuration typically includes backend definitions for your services, health check endpoints, load balancing algorithms (round-robin, least connections, etc.), and SSL termination settings. HAProxy excels at very high traffic loads and provides detailed statistics about backend health and performance.

Nginx:

services:
  nginx:
    image: nginx
    ports:
      - "80:80"
    configs:
      - source: nginx-conf
        target: /etc/nginx/nginx.conf
    deploy:
      replicas: 2

Nginx as a load balancer provides reliable HTTP/HTTPS load balancing with simple configuration. Like HAProxy, running 2 replicas ensures availability if one fails. Nginx configuration specifies upstream servers (your backend service replicas), load balancing methods, health checks, caching rules, and SSL settings.

Nginx is particularly strong at serving static content, caching responses, and handling SSL termination. Many deployments use Nginx for public-facing load balancing and serving static assets, while using Swarm's internal routing mesh for backend service-to-service communication.

25. Debugging Swarm (Task Churn, Networking Issues)

View Service Logs

docker service logs -f --tail 100 web

This command aggregates logs from all replicas of the web service into a single stream. The -f flag follows logs in real-time, showing new entries as they're written. The --tail 100 limits output to the most recent 100 lines per replica, preventing overwhelming output for long-running services.

Each log line includes the task name (e.g., web.1, web.2) identifying which replica generated it, along with timestamps. This aggregated view is invaluable for debugging distributed services where you need to see what's happening across all replicas simultaneously. You might see that requests are failing on web.2 but succeeding on web.1 and web.3, indicating an issue specific to that replica's node or configuration.

Inspect Tasks

docker service ps web --no-trunc
docker inspect <task_id>

The docker service ps command shows all tasks (current and historical) for a service. Each task represents one replica at one point in time. The --no-trunc flag shows complete information without truncating long IDs or error messages.

Task states reveal what's happening: Running means healthy and serving traffic, Failed indicates crashes or health check failures, Shutdown means gracefully stopped, Rejected means scheduling failed due to constraints or resource unavailability. Error messages explain failures: "no suitable node" means no nodes meet placement constraints, "task: non-zero exit (137)" indicates the container was killed (usually by the OOM killer), "failed to resolve network" suggests networking issues.

Inspecting a specific task with docker inspect provides exhaustive details: which node it's assigned to, its container ID, network attachments, when it started, exit codes and errors, resource reservations, and full task history. This level of detail is essential for understanding why tasks are failing or behaving unexpectedly.

Debug Networking

docker network inspect mynet

This displays comprehensive network information including the subnet and gateway addresses, which containers are attached with their IP addresses, network driver options like encryption status, VXLAN identifiers used for overlay encapsulation, and whether the network is internal or attachable.

If containers can't communicate, check that they're on the same network. Verify both services list the network in their docker service inspect output. Check that subnet ranges don't conflict with other networks or your physical infrastructure.

docker run --rm --network mynet alpine ping -c 3 service-name

This creates a temporary container on the same network and attempts to ping a service. Success confirms DNS resolution and basic connectivity work. Failure indicates either DNS issues (service name doesn't resolve), routing problems (packets can't reach the destination), or firewall rules blocking traffic.

sudo iptables -t nat -L -n -v

Swarm uses iptables extensively for routing mesh and load balancing. This command shows NAT table rules that implement port publishing and load balancing. You might see rules forwarding published ports to the ingress network, or IPVS-related rules distributing traffic to replicas. Understanding these rules helps diagnose why published ports aren't accessible or why load balancing isn't working correctly.

Common Issues and Solutions

Task Churn (Containers constantly restarting):

Task churn indicates containers are repeatedly starting and failing. Check resource limits with docker service inspect web. If limits are too low, containers might be OOM-killed, causing restarts. Review logs with docker service logs to see application errors explaining crashes. Verify health checks aren't too strict—if health checks fail before the application finishes starting, containers restart prematurely. Check placement constraints—if no nodes satisfy constraints, tasks remain pending forever or repeatedly fail to start.

Network connectivity issues:

docker network ls | grep overlay

Verify overlay networks exist. Missing networks suggest they were deleted or never created. Check if services are actually attached with docker service inspect web --format '{{.Endpoint.VirtualIPs}}'. Empty results mean the service isn't on any network. Test DNS resolution with docker run --rm --network mynet alpine nslookup service-name. If DNS fails, the Swarm embedded DNS server might have issues.

Common network problems include firewall rules blocking required ports (2377 for management, 7946 for discovery, 4789 for VXLAN), MTU mismatches causing fragmentation and packet loss, network interface selection issues when nodes have multiple interfaces, and VXLAN encapsulation problems when underlying networks don't support it.

Swarm not forming:

sudo ufw allow 2377/tcp
sudo ufw allow 7946/tcp
sudo ufw allow 7946/udp
sudo ufw allow 4789/udp

These firewall rules open the ports Swarm requires. Port 2377 is for cluster management (workers joining, task assignments). Port 7946 TCP and UDP is for container network discovery. Port 4789 UDP carries VXLAN overlay network traffic. Without these ports open, nodes can't communicate properly.

docker swarm init --advertise-addr <correct-ip>

If docker swarm init fails or nodes can't join, verify you're advertising the correct IP address. Servers with multiple network interfaces might default to an IP that other nodes can't reach. Explicitly specifying the advertise address ensures Swarm uses the correct interface for cluster communication.

Manager quorum lost:

docker swarm init --force-new-cluster

This is a disaster recovery command that should only be used as a last resort when all managers except one have permanently failed. It forces the surviving manager to form a new cluster, discarding the p revious Raft log and starting fresh. This recovers cluster operation but you lose all historical task state. Any workers that were connected to the old cluster must be removed and rejoined. Use this only when there's no possibility of recovering lost managers—premature use can cause split-brain scenarios where multiple clusters claim to be authoritative.

Conclusion

Docker Swarm provides a powerful, secure, and straightforward orchestration solution for running distributed applications. Its native integration with Docker, built-in load balancing, encrypted overlay networks, high availability through Raft, rolling updates, and simple commands make it ideal for teams who want production-grade features without the complexity of Kubernetes.

Start simple by initializing a Swarm on one node with docker swarm init, join additional nodes with the provided tokens, and deploy services with docker service create. This minimal setup already provides orchestration, scheduling, and self-healing. Use stacks for multi-service applications, defining your entire application architecture in YAML files deployed with docker stack deploy. Stacks provide version control for infrastructure and enable consistent deployments across environments.

Implement health checks for zero-downtime updates. Well-designed health checks allow Swarm to detect failures and prevent routing traffic to unhealthy containers. Combined with rolling update configurations, health checks enable deployments without service interruption. Secure with secrets for sensitive data, mTLS for node communication, and network encryption for application traffic. Swarm provides security by default but offers controls for organizations with strict compliance requirements.

Monitor with Prometheus for metrics collection and alerting, combined with Grafana for visualization. Centralized logging with Fluentd, Graylog, or ELK stack provides the observability needed to troubleshoot issues across distributed applications. Scale based on metrics and demand rather than arbitrary schedules. While Swarm doesn't provide built-in autoscaling, integration with monitoring systems allows automated scaling based on actual resource utilization.

Plan for high availability with odd-numbered manager quorums. Three managers survive one failure, five survive two failures. Distribute managers across availability zones to survive zone-level outages. Use external storage solutions like Portworx, GlusterFS, or NFS for stateful applications, ensuring data survives container and node failures.

By mastering service scheduling, secrets management, scaling strategies, monitoring, and advanced networking, you can confidently deploy scalable, resilient, zero-downtime applications using Docker Swarm. It remains one of the easiest and most efficient ways to orchestrate containerized workloads at scale, particularly valuable for teams that want production-grade orchestration without the operational complexity of more sophisticated platforms.

Production-Ready Docker Swarm: Deployment, Scaling, and Operations

Docker Swarm

1. What is Docker Swarm?

2. Swarm vs Docker Compose

3. Initializing Swarm — docker swarm init

4. Manager Node vs Worker Node

Manager Node

​Worker Node​

Promotion and demotion

5. Creating and Joining Nodes to a Cluster

6. Services vs Containers in Swarm

7. Deploying Stacks — docker stack deploy

8. Scaling Services — docker service scale

9. Checking Cluster Status

10. Basic Load Balancing

11. Overlay Networks

12. Basic Rolling Updates

13. Swarm Internal Architecture (Raft, Managers, Workers)

14. Advanced Service Scheduling (Constraints, Preferences)

Node Constraints

Placement Preferences

15. Overlay Networking Deep-Dive (VXLAN, Routing Mesh, Ingress)

16. Swarm Secrets & Configs

Secrets

Configs

17. Zero-Downtime Rolling Updates

18. Scaling Strategies (Metric-Based Autoscaling)

19. High Availability (Manager Quorum, Anti-Affinity)

20. Persistent Storage in Swarm

21. Monitoring & Logging

Monitoring Stack

Logging Solutions

22. Security Hardening

Mutual TLS (mTLS)

Security Options

Network Security

22. Security Hardening

Mutual TLS (mTLS)

Security Options

Network Security

23. Node Lifecycle Management

24. Production-Grade Load Balancing Patterns

Internal Load Balancing (VIP Mode)

DNS Round Robin (DNSRR Mode)

External Load Balancers

25. Debugging Swarm (Task Churn, Networking Issues)

Common Issues and Solutions

Conclusion

Worker Node