Skip to content

Node 201 (Manager) - homenet-ubuntu1

[!danger] Critical Infrastructure Node This is the Swarm Manager and hosts all critical data layer services. Failure of this node affects cluster management and database availability.

Node Information

Property Value
Hostname homenet-ubuntu1
IP Address 100.1.100.201
Role Manager (Leader)
Node ID y8yu1d46pv8gh8w4v7cyzi4cj
Status ✅ Active / Ready
Availability Active

Hardware Resources

Resource Allocation
CPUs 8 cores
Memory 10GB RAM
Hypervisor Proxmox-1 (100.1.100.10)
OS Ubuntu 24.04.3 LTS
Docker 29.1.3
Architecture x86_64

Node Labels

This node hosts critical infrastructure services via the following labels:

homenet.elastic_stack: "true"         # Elasticsearch & Logstash
homenet.influx: "true"                # InfluxDB time-series DB
homenet.mariadb: "true"               # Primary SQL database
homenet.redis: "true"                 # Cache layer
homenet.crm: "true"                   # EspoCRM
swarmpit.db-data: "true"              # Swarmpit persistence
swarmpit.influx-data: "true"          # Swarmpit metrics
traefik-public.traefik-public-certificates: "true"  # SSL certificates

Hosted Services

Critical Data Layer (Stack: homenet1)

[!warning] ELK Stack Offline Elasticsearch and Logstash are currently scaled to 0/0. See [[06-Troubleshooting/ELK-Stack-Offline|ELK Stack Investigation]]

Service Status Port(s) Purpose
Elasticsearch ❌ 0/0 9200 Log storage
Logstash ❌ 0/0 6514, 10514 Log processing
Logstash-cacher ❌ 0/0 12201 (GELF) Log aggregation
InfluxDB ✅ 1/1 8086 Time-series metrics
MariaDB ✅ 1/1 3006 SQL database
PostgreSQL 13 ✅ 1/1 - PostgreSQL database
Redis ✅ 1/1 - Cache/queue

Access & Management Services

Service Status Port(s) Purpose
Traefik ✅ 1/1 80, 443, 8080 Reverse proxy & SSL
Prometheus ✅ 1/1 9090 Metrics collection
Grafana ✅ 1/1 3010 Metrics visualization

CRM Stack

Service Status Purpose
EspoCRM ✅ 5/5 Customer relationship management
CRM MariaDB ✅ 1/1 CRM database

Cluster Management (Swarmpit)

Service Status Purpose
Swarmpit UI ✅ 1/1 Cluster management interface
CouchDB ✅ 1/1 Swarmpit config storage
Swarmpit InfluxDB ✅ 1/1 Swarmpit metrics

Service Placement Strategy

This node hosts services requiring: - High availability (manager node, less likely to be drained) - Persistent data (databases, metrics stores) - Centralized logging (ELK stack - when restored) - SSL certificates (Traefik Let's Encrypt storage)

Storage Mounts

NFS Mounts (from 100.1.100.199)

Mount Point NFS Share Size Usage Purpose
/nfs_data /HomeNetServices 3.0T 92% Service data
/nfs_media /SharedStuff 3.0T 92% Media library
/nfs_cams /CameraFootage 69G 40% Camera recordings
/nfs_personal /PersonalData 503G 76% Personal files

Local Volumes

Critical data stored in /homenet_data/ (symlinked to /nfs_data): - influx-2.1.1/ - InfluxDB data - mariadb/ - MariaDB databases - postgresql/ - PostgreSQL data - elasticsearch/ - Elasticsearch indices (when active) - swarmpit/ - Swarmpit persistence

Network Configuration

Overlay Networks

  • homenet - Primary service network
  • traefik-public - Reverse proxy network
  • elastic - Elasticsearch cluster
  • logs-network - Log aggregation
  • swarmpit_net - Cluster management

DNS Configuration

Primary DNS: 100.1.100.11 (Pi DNS) Current: Using systemd-resolved stub (127.0.0.53)

[!note] DNS Configuration Currently using systemd-resolved stub resolver. Consider direct Pi DNS configuration for network-wide filtering.

Critical Responsibilities

1. Swarm Manager

  • Orchestration: Schedules services across workers
  • State Management: Maintains cluster state in Raft consensus
  • API Endpoint: All docker service commands route here

2. Data Persistence

  • All SQL databases (MariaDB, PostgreSQL)
  • Time-series metrics (InfluxDB)
  • Log storage (Elasticsearch - offline)
  • Cluster config (Swarmpit CouchDB)

3. SSL/TLS Certificates

  • Let's Encrypt certificates stored here
  • Traefik certificate resolver persistence
  • Renewal managed by Traefik

4. Metrics & Monitoring

  • Prometheus scrapes all exporters
  • Grafana dashboards
  • InfluxDB for historical metrics

Failure Impact Analysis

[!danger] Manager Failure Scenarios

If Node 201 Goes Down: - ❌ Cannot deploy new services (no manager available) - ❌ Cannot update existing services (no cluster orchestration) - ❌ Database services offline (MariaDB, PostgreSQL, InfluxDB) - ❌ No metrics collection (Prometheus down) - ❌ No monitoring dashboards (Grafana down) - ✅ Existing worker services continue (Plex, ARR stack, cameras)

Recovery Priority: 1. Restore Proxmox VM 2. Verify NFS mounts 3. Restart databases 4. Restore Swarm manager 5. Validate service health

Maintenance Procedures

Draining for Maintenance

[!caution] Never Drain Manager Draining the only manager node will prevent all cluster operations. Plan maintenance windows carefully.

# DO NOT drain manager unless you have 3+ managers
# docker node update --availability drain homenet-ubuntu1

# Instead, update services individually or schedule downtime

Backup Critical Data

# Database backups (automated via cron)
./sh-backup-databases.sh

# Manual Prometheus snapshot
curl -XPOST http://100.1.100.201:9090/api/v1/admin/tsdb/snapshot

# Grafana backup (dashboards via provisioning)
tar -czf grafana-backup.tar.gz /homenet_config/grafana/

Resource Monitoring

# Check service resource usage
docker stats

# View running services on this node
docker node ps homenet-ubuntu1

# Check disk usage (critical for databases)
df -h /nfs_data

Known Issues

1. ELK Stack Offline

Status: ❌ Critical Impact: No centralized logging Services Affected: Elasticsearch, Logstash, Logstash-cacher, Kibana Investigation: [[06-Troubleshooting/ELK-Stack-Offline|ELK Stack Investigation]]

2. Storage Near Capacity

Status: ⚠️ Warning Current: 92% used (254GB remaining) Risk: Database write failures, service disruption Action Required: [[05-Storage/Storage-Critical-Warning|Storage Capacity Planning]]

3. Swarmpit Agent Missing

Status: ⚠️ Warning Current: 4/5 global instances Impact: One node not reporting to Swarmpit Investigation: Check which worker node is missing

Performance Optimization

Current Resource Usage

  • CPU: Moderate (databases, Prometheus, Grafana)
  • Memory: ~8GB used (InfluxDB, MariaDB, Prometheus)
  • Disk I/O: High (database writes, Prometheus TSDB)

Optimization Opportunities

  1. Restore ELK Stack - Or permanently remove to reclaim resources
  2. Prometheus Retention - Review retention period (default 15d)
  3. InfluxDB Compaction - Monitor BBolt file size
  4. Database Tuning - Optimize MariaDB buffer pool

Access & Management

SSH Access

ssh 100.1.100.201
# or
ssh homenet-ubuntu1

Service Logs

# View manager logs
journalctl -u docker.service -f

# Service-specific logs
docker service logs -f homenet1_influxdb
docker service logs -f homenet1_mariadb
docker service logs -f monitoring_prometheus

Grafana Access

URL: http://100.1.100.201:3010
Dashboard: Metrics visualization

Prometheus Access

URL: http://100.1.100.201:9090
Targets: /targets
Alerts: /alerts
  • [[Cluster-Overview|Cluster Overview]]
  • [[02-Services/Stack-Homenet1|Stack Homenet1 (Data Layer)]]
  • [[02-Services/Stack-Traefik|Traefik Reverse Proxy]]
  • [[04-Monitoring/Prometheus-Setup|Prometheus Configuration]]
  • [[06-Troubleshooting/ELK-Stack-Offline|ELK Stack Troubleshooting]]
  • [[05-Storage/NFS-Architecture|NFS Storage]]

Useful Commands

# Apply node labels
./sh-label-nodes.sh

# Check node status
docker node inspect homenet-ubuntu1

# View services on this node
docker node ps homenet-ubuntu1

# Force service update
docker service update --force homenet1_influxdb

# Check NFS mounts
df -h | grep nfs
./sh-correct-mounts.sh

Last Updated: 2026-01-11 Health Status: ✅ Manager Active, ⚠️ ELK Stack Offline Next Review: Investigate ELK stack failure, monitor storage capacity