Node 201 (Manager) - homenet-ubuntu1¶

[!danger] Critical Infrastructure Node This is the Swarm Manager and hosts all critical data layer services. Failure of this node affects cluster management and database availability.

Node Information¶

Property	Value
Hostname	homenet-ubuntu1
IP Address	100.1.100.201
Role	Manager (Leader)
Node ID	`y8yu1d46pv8gh8w4v7cyzi4cj`
Status	✅ Active / Ready
Availability	Active

Hardware Resources¶

Resource	Allocation
CPUs	8 cores
Memory	10GB RAM
Hypervisor	Proxmox-1 (100.1.100.10)
OS	Ubuntu 24.04.3 LTS
Docker	29.1.3
Architecture	x86_64

Node Labels¶

This node hosts critical infrastructure services via the following labels:

homenet.elastic_stack: "true"         # Elasticsearch & Logstash
homenet.influx: "true"                # InfluxDB time-series DB
homenet.mariadb: "true"               # Primary SQL database
homenet.redis: "true"                 # Cache layer
homenet.crm: "true"                   # EspoCRM
swarmpit.db-data: "true"              # Swarmpit persistence
swarmpit.influx-data: "true"          # Swarmpit metrics
traefik-public.traefik-public-certificates: "true"  # SSL certificates

Hosted Services¶

Critical Data Layer (Stack: homenet1)¶

[!warning] ELK Stack Offline Elasticsearch and Logstash are currently scaled to 0/0. See [[06-Troubleshooting/ELK-Stack-Offline|ELK Stack Investigation]]

Service	Status	Port(s)	Purpose
Elasticsearch	❌ 0/0	9200	Log storage
Logstash	❌ 0/0	6514, 10514	Log processing
Logstash-cacher	❌ 0/0	12201 (GELF)	Log aggregation
InfluxDB	✅ 1/1	8086	Time-series metrics
MariaDB	✅ 1/1	3006	SQL database
PostgreSQL 13	✅ 1/1	-	PostgreSQL database
Redis	✅ 1/1	-	Cache/queue

Access & Management Services¶

Service	Status	Port(s)	Purpose
Traefik	✅ 1/1	80, 443, 8080	Reverse proxy & SSL
Prometheus	✅ 1/1	9090	Metrics collection
Grafana	✅ 1/1	3010	Metrics visualization

CRM Stack¶

Service	Status	Purpose
EspoCRM	✅ 5/5	Customer relationship management
CRM MariaDB	✅ 1/1	CRM database

Cluster Management (Swarmpit)¶

Service	Status	Purpose
Swarmpit UI	✅ 1/1	Cluster management interface
CouchDB	✅ 1/1	Swarmpit config storage
Swarmpit InfluxDB	✅ 1/1	Swarmpit metrics

Service Placement Strategy¶

This node hosts services requiring: - High availability (manager node, less likely to be drained) - Persistent data (databases, metrics stores) - Centralized logging (ELK stack - when restored) - SSL certificates (Traefik Let's Encrypt storage)

Storage Mounts¶

NFS Mounts (from 100.1.100.199)¶

Mount Point	NFS Share	Size	Usage	Purpose
`/nfs_data`	/HomeNetServices	3.0T	92%	Service data
`/nfs_media`	/SharedStuff	3.0T	92%	Media library
`/nfs_cams`	/CameraFootage	69G	40%	Camera recordings
`/nfs_personal`	/PersonalData	503G	76%	Personal files

Local Volumes¶

Critical data stored in /homenet_data/ (symlinked to /nfs_data): - influx-2.1.1/ - InfluxDB data - mariadb/ - MariaDB databases - postgresql/ - PostgreSQL data - elasticsearch/ - Elasticsearch indices (when active) - swarmpit/ - Swarmpit persistence

Network Configuration¶

Overlay Networks¶

homenet - Primary service network
traefik-public - Reverse proxy network
elastic - Elasticsearch cluster
logs-network - Log aggregation
swarmpit_net - Cluster management

DNS Configuration¶

Primary DNS: 100.1.100.11 (Pi DNS) Current: Using systemd-resolved stub (127.0.0.53)

[!note] DNS Configuration Currently using systemd-resolved stub resolver. Consider direct Pi DNS configuration for network-wide filtering.

Critical Responsibilities¶

1. Swarm Manager¶

Orchestration: Schedules services across workers
State Management: Maintains cluster state in Raft consensus
API Endpoint: All docker service commands route here

2. Data Persistence¶

All SQL databases (MariaDB, PostgreSQL)
Time-series metrics (InfluxDB)
Log storage (Elasticsearch - offline)
Cluster config (Swarmpit CouchDB)

3. SSL/TLS Certificates¶

Let's Encrypt certificates stored here
Traefik certificate resolver persistence
Renewal managed by Traefik

4. Metrics & Monitoring¶

Prometheus scrapes all exporters
Grafana dashboards
InfluxDB for historical metrics

Failure Impact Analysis¶

[!danger] Manager Failure Scenarios

If Node 201 Goes Down: - ❌ Cannot deploy new services (no manager available) - ❌ Cannot update existing services (no cluster orchestration) - ❌ Database services offline (MariaDB, PostgreSQL, InfluxDB) - ❌ No metrics collection (Prometheus down) - ❌ No monitoring dashboards (Grafana down) - ✅ Existing worker services continue (Plex, ARR stack, cameras)

Recovery Priority: 1. Restore Proxmox VM 2. Verify NFS mounts 3. Restart databases 4. Restore Swarm manager 5. Validate service health

Maintenance Procedures¶

Draining for Maintenance¶

[!caution] Never Drain Manager Draining the only manager node will prevent all cluster operations. Plan maintenance windows carefully.

# DO NOT drain manager unless you have 3+ managers
# docker node update --availability drain homenet-ubuntu1

# Instead, update services individually or schedule downtime

Backup Critical Data¶

# Database backups (automated via cron)
./sh-backup-databases.sh

# Manual Prometheus snapshot
curl -XPOST http://100.1.100.201:9090/api/v1/admin/tsdb/snapshot

# Grafana backup (dashboards via provisioning)
tar -czf grafana-backup.tar.gz /homenet_config/grafana/

Resource Monitoring¶

# Check service resource usage
docker stats

# View running services on this node
docker node ps homenet-ubuntu1

# Check disk usage (critical for databases)
df -h /nfs_data

Known Issues¶

1. ELK Stack Offline¶

Status: ❌ Critical Impact: No centralized logging Services Affected: Elasticsearch, Logstash, Logstash-cacher, Kibana Investigation: [[06-Troubleshooting/ELK-Stack-Offline|ELK Stack Investigation]]

2. Storage Near Capacity¶

Status: ⚠️ Warning Current: 92% used (254GB remaining) Risk: Database write failures, service disruption Action Required: [[05-Storage/Storage-Critical-Warning|Storage Capacity Planning]]

3. Swarmpit Agent Missing¶

Status: ⚠️ Warning Current: 4/5 global instances Impact: One node not reporting to Swarmpit Investigation: Check which worker node is missing

Performance Optimization¶

Current Resource Usage¶

CPU: Moderate (databases, Prometheus, Grafana)
Memory: ~8GB used (InfluxDB, MariaDB, Prometheus)
Disk I/O: High (database writes, Prometheus TSDB)

Optimization Opportunities¶

Restore ELK Stack - Or permanently remove to reclaim resources
Prometheus Retention - Review retention period (default 15d)
InfluxDB Compaction - Monitor BBolt file size
Database Tuning - Optimize MariaDB buffer pool

Access & Management¶

SSH Access¶

ssh 100.1.100.201
# or
ssh homenet-ubuntu1

Service Logs¶

# View manager logs
journalctl -u docker.service -f

# Service-specific logs
docker service logs -f homenet1_influxdb
docker service logs -f homenet1_mariadb
docker service logs -f monitoring_prometheus

Grafana Access¶

URL: http://100.1.100.201:3010
Dashboard: Metrics visualization

Prometheus Access¶

URL: http://100.1.100.201:9090
Targets: /targets
Alerts: /alerts

[[Cluster-Overview|Cluster Overview]]
[[02-Services/Stack-Homenet1|Stack Homenet1 (Data Layer)]]
[[02-Services/Stack-Traefik|Traefik Reverse Proxy]]
[[04-Monitoring/Prometheus-Setup|Prometheus Configuration]]
[[06-Troubleshooting/ELK-Stack-Offline|ELK Stack Troubleshooting]]
[[05-Storage/NFS-Architecture|NFS Storage]]

Useful Commands¶

# Apply node labels
./sh-label-nodes.sh

# Check node status
docker node inspect homenet-ubuntu1

# View services on this node
docker node ps homenet-ubuntu1

# Force service update
docker service update --force homenet1_influxdb

# Check NFS mounts
df -h | grep nfs
./sh-correct-mounts.sh

Last Updated: 2026-01-11 Health Status: ✅ Manager Active, ⚠️ ELK Stack Offline Next Review: Investigate ELK stack failure, monitor storage capacity