Skip to content

Docker Swarm Cluster Overview

[!info] Cluster Status State: ✅ Healthy - All 5 nodes Active/Ready Docker Version: 29.1.3 (all nodes) Host OS: Ubuntu 24.04.3 LTS Architecture: x86_64

Cluster Topology

graph LR
    MGR[Node 201<br/>MANAGER]
    W1[Node 202<br/>WORKER]
    W2[Node 203<br/>WORKER]
    W3[Node 204<br/>WORKER]
    W4[Node 205<br/>WORKER]

    MGR -.Swarm.-> W1
    MGR -.Swarm.-> W2
    MGR -.Swarm.-> W3
    MGR -.Swarm.-> W4

Nodes Summary

Node IP Role Resources Status Primary Function
[[Node-201-Manager|homenet-ubuntu1]] 100.1.100.201 Manager (Leader) 8 CPU, 10GB RAM ✅ Active Critical Infrastructure
[[Node-202-Worker|homenet-ubuntu2]] 100.1.100.202 Worker 12 CPU, 16GB RAM ✅ Active Media & Photos
[[Node-203-Worker|homenet-ubuntu3]] 100.1.100.203 Worker 4 CPU, 3GB RAM ✅ Active Surveillance
[[Node-204-Worker|homenet-ubuntu4]] 100.1.100.204 Worker 4 CPU, 4GB RAM ✅ Active Dashboards & Automation
[[Node-205-Worker|homenet-ubuntu5]] 100.1.100.205 Worker 8 CPU, 8GB RAM ✅ Active General Workloads

Node IDs

# Manager
homenet-ubuntu1: y8yu1d46pv8gh8w4v7cyzi4cj

# Workers
homenet-ubuntu2: jjbyr8m4xffdgzypbsz8nzqua
homenet-ubuntu3: ytxjh4ba2wrxfp7vk3ohxicbk
homenet-ubuntu4: k05ar70cavs4wkc4846axyj2y
homenet-ubuntu5: gytpo6oaql553za0wfkxt2jsu

Resource Distribution

Total Cluster Resources

  • Total CPUs: 40 cores
  • Total RAM: 41GB
  • Network: 10Gbps internal (Docker overlay)

Resource Allocation by Node

pie title CPU Distribution
    "Node 201 (Manager)" : 8
    "Node 202 (Worker)" : 12
    "Node 203 (Worker)" : 4
    "Node 204 (Worker)" : 4
    "Node 205 (Worker)" : 8
pie title RAM Distribution
    "Node 201 (Manager)" : 10
    "Node 202 (Worker)" : 16
    "Node 203 (Worker)" : 3
    "Node 204 (Worker)" : 4
    "Node 205 (Worker)" : 8

Service Placement Strategy

[!note] Node Label System Services are pinned to specific nodes using Docker node labels. Labels are managed via sh-label-nodes.sh script.

Placement Rules

Service Type Preferred Node(s) Reason
Databases Node 201 Manager node, high availability
Media Services Node 202 NVIDIA GPU for transcoding
Photo Services Node 202 Large storage needs, GPU for ML
Cameras Node 203 Dedicated node for video processing
Dashboards Node 204 User-facing services
General Apps Node 205 Overflow and general workloads

Label Configuration

See individual node pages for complete label assignments: - [[Node-201-Manager#Node Labels|Node 201 Labels]] - [[Node-202-Worker#Node Labels|Node 202 Labels]] - [[Node-203-Worker#Node Labels|Node 203 Labels]] - [[Node-204-Worker#Node Labels|Node 204 Labels]] - [[Node-205-Worker#Node Labels|Node 205 Labels]]

Infrastructure Dependencies

Hypervisor Layer

Both Proxmox hosts run multiple VMs including all Docker nodes:

Host IP Nodes Hosted Management URL
proxmox-1 100.1.100.10 Nodes 201, 202 https://100.1.100.10:8006
proxmox-2 100.1.100.15 Nodes 203, 204, 205 https://100.1.100.15:8006

Network Services

Pi DNS Server (100.1.100.11) - Primary DNS for entire network - AdGuard/Pi-hole DNS filtering - Critical for domain resolution - All nodes use as primary nameserver

Storage Layer

OpenMediaVault NFS Server (100.1.100.199) - Provides all persistent storage - 6 NFS shares mounted on each node - Single point of failure for storage - See [[05-Storage/NFS-Architecture|NFS Architecture]]

Overlay Networks

Docker Swarm uses encrypted overlay networks for inter-service communication:

Network Driver Purpose Services
homenet overlay Primary service network Most services
traefik-public overlay Reverse proxy Public-facing services
elastic overlay ELK stack Elasticsearch cluster
logs-network overlay Log aggregation Logstash, collectors
swarmpit_net overlay Cluster management Swarmpit services

See [[Network-Architecture|Network Architecture]] for details.

Stack Distribution

15 Stacks Deployed across the cluster:

Stack Primary Node Services Status
[[02-Services/Stack-Homenet1|homenet1]] 201 7 ⚠️ 3/7 running
[[02-Services/Stack-Homenet2|homenet2]] 204 6 ✅ 6/6 running
[[02-Services/Stack-Homenet3|homenet3]] 203 1 ✅ 1/1 running
[[02-Services/Stack-Homenet4|homenet4]] Mixed 15 ⚠️ 11/15 running
[[02-Services/Stack-Traefik|traefik]] 201 2 ✅ 2/2 running
[[02-Services/Monitoring-Stack|monitoring]] All 12 ⚠️ 8/12 running
swarmpit 201 4 ⚠️ 3/4 running
immich 202 3 ✅ 3/3 running
librephotos 202 5 ⚠️ 4/5 running
photoprism 202 2 ✅ 2/2 running
paperless 201 5 ✅ 5/5 running
rxresume 205 3 ✅ 3/3 running
crm 201 5 ✅ 5/5 running
backup Mixed 1 ❌ 0/1 running

High Availability Considerations

[!warning] Single Manager Node The cluster has only 1 manager node. For production HA, consider adding 2 more manager nodes (total of 3) to maintain quorum during failures.

Current HA Status

  • ✅ Service replicas can reschedule to healthy workers
  • ✅ Global services (cAdvisor, node-exporter) run on all nodes
  • ❌ Single manager = single point of control plane failure
  • ❌ No automatic manager failover

Failure Scenarios

Manager Node (201) Failure: - ❌ Cannot deploy new services or stacks - ❌ Cannot update existing services - ✅ Existing services continue running - ✅ Workers remain operational

Worker Node Failure: - ✅ Services reschedule to healthy nodes - ⚠️ May cause service disruption if node had unique labels

Useful Commands

Cluster Status

# View all nodes
docker node ls

# Detailed node info
docker node inspect homenet-ubuntu1

# View node labels
docker node inspect homenet-ubuntu1 --format '{{.Spec.Labels}}'

Service Distribution

# List all services and their placement
docker service ls

# See which node runs a service
docker service ps homenet1_elasticsearch

# View services on specific node
docker node ps homenet-ubuntu2

Maintenance Operations

# Drain node for maintenance
docker node update --availability drain homenet-ubuntu2

# Return node to active
docker node update --availability active homenet-ubuntu2

# Apply node labels
./sh-label-nodes.sh
  • [[Node-201-Manager|Node 201 Details]]
  • [[Node-202-Worker|Node 202 Details]]
  • [[Node-203-Worker|Node 203 Details]]
  • [[Node-204-Worker|Node 204 Details]]
  • [[Node-205-Worker|Node 205 Details]]
  • [[Network-Architecture|Network Architecture]]
  • [[02-Services/Service-Catalog|Service Catalog]]
  • [[03-Operations/Stack-Deployment|Stack Deployment]]

Last Updated: 2026-01-11 Health Status: ✅ Healthy (all nodes active) Next Review: Monitor storage capacity and consider manager HA