Docker Swarm Cluster Overview¶

[!info] Cluster Status State: ✅ Healthy - All 5 nodes Active/Ready Docker Version: 29.1.3 (all nodes) Host OS: Ubuntu 24.04.3 LTS Architecture: x86_64

Cluster Topology¶

graph LR
    MGR[Node 201<br/>MANAGER]
    W1[Node 202<br/>WORKER]
    W2[Node 203<br/>WORKER]
    W3[Node 204<br/>WORKER]
    W4[Node 205<br/>WORKER]

    MGR -.Swarm.-> W1
    MGR -.Swarm.-> W2
    MGR -.Swarm.-> W3
    MGR -.Swarm.-> W4

Nodes Summary¶

Node	IP	Role	Resources	Status	Primary Function
[[Node-201-Manager\|homenet-ubuntu1]]	100.1.100.201	Manager (Leader)	8 CPU, 10GB RAM	✅ Active	Critical Infrastructure
[[Node-202-Worker\|homenet-ubuntu2]]	100.1.100.202	Worker	12 CPU, 16GB RAM	✅ Active	Media & Photos
[[Node-203-Worker\|homenet-ubuntu3]]	100.1.100.203	Worker	4 CPU, 3GB RAM	✅ Active	Surveillance
[[Node-204-Worker\|homenet-ubuntu4]]	100.1.100.204	Worker	4 CPU, 4GB RAM	✅ Active	Dashboards & Automation
[[Node-205-Worker\|homenet-ubuntu5]]	100.1.100.205	Worker	8 CPU, 8GB RAM	✅ Active	General Workloads

Node IDs¶

# Manager
homenet-ubuntu1: y8yu1d46pv8gh8w4v7cyzi4cj

# Workers
homenet-ubuntu2: jjbyr8m4xffdgzypbsz8nzqua
homenet-ubuntu3: ytxjh4ba2wrxfp7vk3ohxicbk
homenet-ubuntu4: k05ar70cavs4wkc4846axyj2y
homenet-ubuntu5: gytpo6oaql553za0wfkxt2jsu

Resource Distribution¶

Total Cluster Resources¶

Total CPUs: 40 cores
Total RAM: 41GB
Network: 10Gbps internal (Docker overlay)

Resource Allocation by Node¶

pie title CPU Distribution
    "Node 201 (Manager)" : 8
    "Node 202 (Worker)" : 12
    "Node 203 (Worker)" : 4
    "Node 204 (Worker)" : 4
    "Node 205 (Worker)" : 8

pie title RAM Distribution
    "Node 201 (Manager)" : 10
    "Node 202 (Worker)" : 16
    "Node 203 (Worker)" : 3
    "Node 204 (Worker)" : 4
    "Node 205 (Worker)" : 8

Service Placement Strategy¶

[!note] Node Label System Services are pinned to specific nodes using Docker node labels. Labels are managed via sh-label-nodes.sh script.

Placement Rules¶

Service Type	Preferred Node(s)	Reason
Databases	Node 201	Manager node, high availability
Media Services	Node 202	NVIDIA GPU for transcoding
Photo Services	Node 202	Large storage needs, GPU for ML
Cameras	Node 203	Dedicated node for video processing
Dashboards	Node 204	User-facing services
General Apps	Node 205	Overflow and general workloads

Label Configuration¶

See individual node pages for complete label assignments: - [[Node-201-Manager#Node Labels|Node 201 Labels]] - [[Node-202-Worker#Node Labels|Node 202 Labels]] - [[Node-203-Worker#Node Labels|Node 203 Labels]] - [[Node-204-Worker#Node Labels|Node 204 Labels]] - [[Node-205-Worker#Node Labels|Node 205 Labels]]

Infrastructure Dependencies¶

Hypervisor Layer¶

Both Proxmox hosts run multiple VMs including all Docker nodes:

Host	IP	Nodes Hosted	Management URL
proxmox-1	100.1.100.10	Nodes 201, 202	https://100.1.100.10:8006
proxmox-2	100.1.100.15	Nodes 203, 204, 205	https://100.1.100.15:8006

Network Services¶

Pi DNS Server (100.1.100.11) - Primary DNS for entire network - AdGuard/Pi-hole DNS filtering - Critical for domain resolution - All nodes use as primary nameserver

Storage Layer¶

OpenMediaVault NFS Server (100.1.100.199) - Provides all persistent storage - 6 NFS shares mounted on each node - Single point of failure for storage - See [[05-Storage/NFS-Architecture|NFS Architecture]]

Overlay Networks¶

Docker Swarm uses encrypted overlay networks for inter-service communication:

Network	Driver	Purpose	Services
`homenet`	overlay	Primary service network	Most services
`traefik-public`	overlay	Reverse proxy	Public-facing services
`elastic`	overlay	ELK stack	Elasticsearch cluster
`logs-network`	overlay	Log aggregation	Logstash, collectors
`swarmpit_net`	overlay	Cluster management	Swarmpit services

See [[Network-Architecture|Network Architecture]] for details.

Stack Distribution¶

15 Stacks Deployed across the cluster:

Stack	Primary Node	Services	Status
[[02-Services/Stack-Homenet1\|homenet1]]	201	7	⚠️ 3/7 running
[[02-Services/Stack-Homenet2\|homenet2]]	204	6	✅ 6/6 running
[[02-Services/Stack-Homenet3\|homenet3]]	203	1	✅ 1/1 running
[[02-Services/Stack-Homenet4\|homenet4]]	Mixed	15	⚠️ 11/15 running
[[02-Services/Stack-Traefik\|traefik]]	201	2	✅ 2/2 running
[[02-Services/Monitoring-Stack\|monitoring]]	All	12	⚠️ 8/12 running
swarmpit	201	4	⚠️ 3/4 running
immich	202	3	✅ 3/3 running
librephotos	202	5	⚠️ 4/5 running
photoprism	202	2	✅ 2/2 running
paperless	201	5	✅ 5/5 running
rxresume	205	3	✅ 3/3 running
crm	201	5	✅ 5/5 running
backup	Mixed	1	❌ 0/1 running

High Availability Considerations¶

[!warning] Single Manager Node The cluster has only 1 manager node. For production HA, consider adding 2 more manager nodes (total of 3) to maintain quorum during failures.

Current HA Status¶

✅ Service replicas can reschedule to healthy workers
✅ Global services (cAdvisor, node-exporter) run on all nodes
❌ Single manager = single point of control plane failure
❌ No automatic manager failover

Failure Scenarios¶

Manager Node (201) Failure: - ❌ Cannot deploy new services or stacks - ❌ Cannot update existing services - ✅ Existing services continue running - ✅ Workers remain operational

Worker Node Failure: - ✅ Services reschedule to healthy nodes - ⚠️ May cause service disruption if node had unique labels

Useful Commands¶

Cluster Status¶

# View all nodes
docker node ls

# Detailed node info
docker node inspect homenet-ubuntu1

# View node labels
docker node inspect homenet-ubuntu1 --format '{{.Spec.Labels}}'

Service Distribution¶

# List all services and their placement
docker service ls

# See which node runs a service
docker service ps homenet1_elasticsearch

# View services on specific node
docker node ps homenet-ubuntu2

Maintenance Operations¶

# Drain node for maintenance
docker node update --availability drain homenet-ubuntu2

# Return node to active
docker node update --availability active homenet-ubuntu2

# Apply node labels
./sh-label-nodes.sh

[[Node-201-Manager|Node 201 Details]]
[[Node-202-Worker|Node 202 Details]]
[[Node-203-Worker|Node 203 Details]]
[[Node-204-Worker|Node 204 Details]]
[[Node-205-Worker|Node 205 Details]]
[[Network-Architecture|Network Architecture]]
[[02-Services/Service-Catalog|Service Catalog]]
[[03-Operations/Stack-Deployment|Stack Deployment]]

Last Updated: 2026-01-11 Health Status: ✅ Healthy (all nodes active) Next Review: Monitor storage capacity and consider manager HA