Node 201 (Manager) - homenet-ubuntu1¶
[!danger] Critical Infrastructure Node This is the Swarm Manager and hosts all critical data layer services. Failure of this node affects cluster management and database availability.
Node Information¶
| Property | Value |
|---|---|
| Hostname | homenet-ubuntu1 |
| IP Address | 100.1.100.201 |
| Role | Manager (Leader) |
| Node ID | y8yu1d46pv8gh8w4v7cyzi4cj |
| Status | ✅ Active / Ready |
| Availability | Active |
Hardware Resources¶
| Resource | Allocation |
|---|---|
| CPUs | 8 cores |
| Memory | 10GB RAM |
| Hypervisor | Proxmox-1 (100.1.100.10) |
| OS | Ubuntu 24.04.3 LTS |
| Docker | 29.1.3 |
| Architecture | x86_64 |
Node Labels¶
This node hosts critical infrastructure services via the following labels:
homenet.elastic_stack: "true" # Elasticsearch & Logstash
homenet.influx: "true" # InfluxDB time-series DB
homenet.mariadb: "true" # Primary SQL database
homenet.redis: "true" # Cache layer
homenet.crm: "true" # EspoCRM
swarmpit.db-data: "true" # Swarmpit persistence
swarmpit.influx-data: "true" # Swarmpit metrics
traefik-public.traefik-public-certificates: "true" # SSL certificates
Hosted Services¶
Critical Data Layer (Stack: homenet1)¶
[!warning] ELK Stack Offline Elasticsearch and Logstash are currently scaled to 0/0. See [[06-Troubleshooting/ELK-Stack-Offline|ELK Stack Investigation]]
| Service | Status | Port(s) | Purpose |
|---|---|---|---|
| Elasticsearch | ❌ 0/0 | 9200 | Log storage |
| Logstash | ❌ 0/0 | 6514, 10514 | Log processing |
| Logstash-cacher | ❌ 0/0 | 12201 (GELF) | Log aggregation |
| InfluxDB | ✅ 1/1 | 8086 | Time-series metrics |
| MariaDB | ✅ 1/1 | 3006 | SQL database |
| PostgreSQL 13 | ✅ 1/1 | - | PostgreSQL database |
| Redis | ✅ 1/1 | - | Cache/queue |
Access & Management Services¶
| Service | Status | Port(s) | Purpose |
|---|---|---|---|
| Traefik | ✅ 1/1 | 80, 443, 8080 | Reverse proxy & SSL |
| Prometheus | ✅ 1/1 | 9090 | Metrics collection |
| Grafana | ✅ 1/1 | 3010 | Metrics visualization |
CRM Stack¶
| Service | Status | Purpose |
|---|---|---|
| EspoCRM | ✅ 5/5 | Customer relationship management |
| CRM MariaDB | ✅ 1/1 | CRM database |
Cluster Management (Swarmpit)¶
| Service | Status | Purpose |
|---|---|---|
| Swarmpit UI | ✅ 1/1 | Cluster management interface |
| CouchDB | ✅ 1/1 | Swarmpit config storage |
| Swarmpit InfluxDB | ✅ 1/1 | Swarmpit metrics |
Service Placement Strategy¶
This node hosts services requiring: - High availability (manager node, less likely to be drained) - Persistent data (databases, metrics stores) - Centralized logging (ELK stack - when restored) - SSL certificates (Traefik Let's Encrypt storage)
Storage Mounts¶
NFS Mounts (from 100.1.100.199)¶
| Mount Point | NFS Share | Size | Usage | Purpose |
|---|---|---|---|---|
/nfs_data |
/HomeNetServices | 3.0T | 92% | Service data |
/nfs_media |
/SharedStuff | 3.0T | 92% | Media library |
/nfs_cams |
/CameraFootage | 69G | 40% | Camera recordings |
/nfs_personal |
/PersonalData | 503G | 76% | Personal files |
Local Volumes¶
Critical data stored in /homenet_data/ (symlinked to /nfs_data):
- influx-2.1.1/ - InfluxDB data
- mariadb/ - MariaDB databases
- postgresql/ - PostgreSQL data
- elasticsearch/ - Elasticsearch indices (when active)
- swarmpit/ - Swarmpit persistence
Network Configuration¶
Overlay Networks¶
homenet- Primary service networktraefik-public- Reverse proxy networkelastic- Elasticsearch clusterlogs-network- Log aggregationswarmpit_net- Cluster management
DNS Configuration¶
Primary DNS: 100.1.100.11 (Pi DNS) Current: Using systemd-resolved stub (127.0.0.53)
[!note] DNS Configuration Currently using systemd-resolved stub resolver. Consider direct Pi DNS configuration for network-wide filtering.
Critical Responsibilities¶
1. Swarm Manager¶
- Orchestration: Schedules services across workers
- State Management: Maintains cluster state in Raft consensus
- API Endpoint: All
docker servicecommands route here
2. Data Persistence¶
- All SQL databases (MariaDB, PostgreSQL)
- Time-series metrics (InfluxDB)
- Log storage (Elasticsearch - offline)
- Cluster config (Swarmpit CouchDB)
3. SSL/TLS Certificates¶
- Let's Encrypt certificates stored here
- Traefik certificate resolver persistence
- Renewal managed by Traefik
4. Metrics & Monitoring¶
- Prometheus scrapes all exporters
- Grafana dashboards
- InfluxDB for historical metrics
Failure Impact Analysis¶
[!danger] Manager Failure Scenarios
If Node 201 Goes Down: - ❌ Cannot deploy new services (no manager available) - ❌ Cannot update existing services (no cluster orchestration) - ❌ Database services offline (MariaDB, PostgreSQL, InfluxDB) - ❌ No metrics collection (Prometheus down) - ❌ No monitoring dashboards (Grafana down) - ✅ Existing worker services continue (Plex, ARR stack, cameras)
Recovery Priority: 1. Restore Proxmox VM 2. Verify NFS mounts 3. Restart databases 4. Restore Swarm manager 5. Validate service health
Maintenance Procedures¶
Draining for Maintenance¶
[!caution] Never Drain Manager Draining the only manager node will prevent all cluster operations. Plan maintenance windows carefully.
# DO NOT drain manager unless you have 3+ managers
# docker node update --availability drain homenet-ubuntu1
# Instead, update services individually or schedule downtime
Backup Critical Data¶
# Database backups (automated via cron)
./sh-backup-databases.sh
# Manual Prometheus snapshot
curl -XPOST http://100.1.100.201:9090/api/v1/admin/tsdb/snapshot
# Grafana backup (dashboards via provisioning)
tar -czf grafana-backup.tar.gz /homenet_config/grafana/
Resource Monitoring¶
# Check service resource usage
docker stats
# View running services on this node
docker node ps homenet-ubuntu1
# Check disk usage (critical for databases)
df -h /nfs_data
Known Issues¶
1. ELK Stack Offline¶
Status: ❌ Critical Impact: No centralized logging Services Affected: Elasticsearch, Logstash, Logstash-cacher, Kibana Investigation: [[06-Troubleshooting/ELK-Stack-Offline|ELK Stack Investigation]]
2. Storage Near Capacity¶
Status: ⚠️ Warning Current: 92% used (254GB remaining) Risk: Database write failures, service disruption Action Required: [[05-Storage/Storage-Critical-Warning|Storage Capacity Planning]]
3. Swarmpit Agent Missing¶
Status: ⚠️ Warning Current: 4/5 global instances Impact: One node not reporting to Swarmpit Investigation: Check which worker node is missing
Performance Optimization¶
Current Resource Usage¶
- CPU: Moderate (databases, Prometheus, Grafana)
- Memory: ~8GB used (InfluxDB, MariaDB, Prometheus)
- Disk I/O: High (database writes, Prometheus TSDB)
Optimization Opportunities¶
- Restore ELK Stack - Or permanently remove to reclaim resources
- Prometheus Retention - Review retention period (default 15d)
- InfluxDB Compaction - Monitor BBolt file size
- Database Tuning - Optimize MariaDB buffer pool
Access & Management¶
SSH Access¶
Service Logs¶
# View manager logs
journalctl -u docker.service -f
# Service-specific logs
docker service logs -f homenet1_influxdb
docker service logs -f homenet1_mariadb
docker service logs -f monitoring_prometheus
Grafana Access¶
Prometheus Access¶
Related Documentation¶
- [[Cluster-Overview|Cluster Overview]]
- [[02-Services/Stack-Homenet1|Stack Homenet1 (Data Layer)]]
- [[02-Services/Stack-Traefik|Traefik Reverse Proxy]]
- [[04-Monitoring/Prometheus-Setup|Prometheus Configuration]]
- [[06-Troubleshooting/ELK-Stack-Offline|ELK Stack Troubleshooting]]
- [[05-Storage/NFS-Architecture|NFS Storage]]
Useful Commands¶
# Apply node labels
./sh-label-nodes.sh
# Check node status
docker node inspect homenet-ubuntu1
# View services on this node
docker node ps homenet-ubuntu1
# Force service update
docker service update --force homenet1_influxdb
# Check NFS mounts
df -h | grep nfs
./sh-correct-mounts.sh
Last Updated: 2026-01-11 Health Status: ✅ Manager Active, ⚠️ ELK Stack Offline Next Review: Investigate ELK stack failure, monitor storage capacity