Storage Critical Warning - 92% Capacity¶

[!danger] Critical Storage Alert Status: 🔴 CRITICAL Severity: P0 - Immediate attention required Impact: Risk of service failures, database corruption, data loss Available Space: 254GB of 3TB (8% remaining)

Current Status¶

Discovery Date: 2026-01-11 Current Utilization: 92% across primary NFS mounts Time to Critical (95%): ~2-4 weeks at current growth rate Time to Full (100%): ~2-10 weeks depending on usage

Affected Mounts¶

Mount Point	Total	Used	Available	Use%	Risk Level
`/nfs_data`	3.0TB	2.8TB	254GB	92%	🔴 Critical
`/nfs_media`	3.0TB	2.8TB	254GB	92%	🔴 Critical
`/nfs_media_lib`	3.0TB	2.8TB	254GB	92%	🔴 Critical
`/nfs_service`	3.0TB	2.8TB	254GB	92%	🔴 Critical
`/nfs_personal`	503GB	379GB	124GB	76%	🟡 Warning
`/nfs_cams`	69GB	27GB	42GB	40%	🟢 Healthy

[!warning] Shared Storage Pool The four critical mounts (/nfs_data, /nfs_media, /nfs_media_lib, /nfs_service) share the same underlying 3TB volume. Effective available space is ~254GB total, not 254GB per mount.

Impact Analysis¶

Immediate Risks¶

When storage reaches 95%: - ⚠️ Database write failures (MariaDB, PostgreSQL, InfluxDB) - ⚠️ Service crash loops due to failed writes - ⚠️ Docker volume creation failures - ⚠️ Log rotation failures - ⚠️ Photo upload rejections - ⚠️ Media download failures

When storage reaches 98%: - 🔴 Plex metadata corruption - 🔴 Database corruption risk - 🔴 Service data loss - 🔴 Stack deployment failures - 🔴 Container OOM kills

When storage reaches 100%: - 💀 Complete service failure - 💀 Data corruption across services - 💀 Potential permanent data loss - 💀 Extended recovery time

Affected Services (by priority)¶

P0 - Critical Data Services: - MariaDB databases (CRM, PhotoPrism, general) - PostgreSQL databases (Immich, Paperless, RxResume, LibrePhotos) - InfluxDB time-series data - Elasticsearch indices (when restored) - Redis persistence

P1 - High Priority Services: - Plex metadata and cache - Prometheus TSDB - Grafana dashboards - Photo libraries (Immich, PhotoPrism, LibrePhotos) - Document storage (Paperless)

P2 - Medium Priority: - Media downloads (Transmission, ARR) - Camera recordings (iSpy) - Service logs - Configuration backups

Immediate Actions (Next 24-48 Hours)¶

1. Identify Space Hogs¶

# Find largest directories in /nfs_data
du -h --max-depth=1 /nfs_data | sort -hr | head -20

# Find largest files
find /nfs_data -type f -size +1G -exec ls -lh {} \; | sort -k5 -hr

# Check specific service directories
du -sh /nfs_data/plex/*
du -sh /nfs_data/prometheus/*
du -sh /nfs_data/mariadb/*

2. Quick Cleanup Opportunities¶

Low-hanging fruit (safe to delete):

# Clean Docker system (images, containers, volumes)
docker system prune -af --volumes

# Delete temporary files
./sh-delete-temp.sh

# Plex cache cleanup
rm -rf /nfs_data/plex/Library/Application\ Support/Plex\ Media\ Server/Cache/*

# Old log files
find /nfs_data -name "*.log" -mtime +30 -delete
find /nfs_data -name "*.log.*" -mtime +30 -delete

# Prometheus old data (adjust retention)
# Edit Prometheus config to reduce retention from 15d to 7d

# Transmission completed downloads (if already imported)
# Review /nfs_data/transmission/downloads/complete/

3. Stop Non-Critical Services¶

Free up write operations and potential space:

# Game servers (if not in use)
docker service scale homenet4_palworld=0
docker service scale homenet4_satisfactory=0

# Backup service (already offline)
# docker service scale backup_duplicati=0

# Consider temporarily scaling down photo services if not actively used
# docker service scale librephotos_backend=0
# docker service scale albumviewer_backend=0

Short-Term Solutions (Next 1-2 Weeks)¶

1. Media Library Cleanup¶

Review and archive: - Old/unwatched movies and TV shows - Duplicate media files - Low-quality media (upgrade to remux later) - Temporary downloads folder

# Find duplicate files
fdupes -r /nfs_media/Movies/
fdupes -r /nfs_media/TV\ Shows/

# Find old files not accessed in 6 months
find /nfs_media -type f -atime +180 -exec ls -lh {} \;

2. Photo Library Optimization¶

Compress and deduplicate: - Remove duplicate photos - Compress original photos (lossless) - Delete thumbnails/cache (regenerate as needed)

# Check photo library sizes
du -sh /nfs_data/immich/*
du -sh /nfs_data/photoprism/*
du -sh /nfs_data/librephotos/*

# Consider consolidating to single photo service

3. Database Optimization¶

Cleanup old data:

# InfluxDB - compact and downsample old data
# Access InfluxDB UI: http://100.1.100.201:8086

# MariaDB - optimize tables
docker exec -it $(docker ps -qf "name=mariadb") mysql -p
> OPTIMIZE TABLE <table_name>;

# PostgreSQL - vacuum
docker exec -it $(docker ps -qf "name=postgres") psql -U <user>
> VACUUM FULL;

# Elasticsearch - delete old indices (when restored)
curl -X DELETE http://100.1.100.201:9200/logstash-*-older-than-30-days

4. Prometheus Retention¶

Reduce retention period:

Edit Prometheus config:

global:
  retention.time: 7d  # Reduce from 15d to 7d

Redeploy:

docker service update --force monitoring_prometheus

Medium-Term Solutions (Next 1-3 Months)¶

1. Storage Expansion¶

Option A: Expand OMV Storage Pool

Pros: - No configuration changes needed - Transparent to services - Quick implementation

Cons: - Requires hardware purchase - May have physical limits

Implementation: 1. Add physical disks to OMV server 2. Extend existing storage pool 3. Resize filesystem 4. Verify expansion: df -h

Option B: Add New Storage Pool

Pros: - Separate hot/cold data - Better performance tuning - Storage tiering

Cons: - Requires service reconfiguration - Data migration effort - More complex management

Implementation: 1. Create new NFS export on OMV 2. Mount new export on Docker nodes: /nfs_cold_storage 3. Migrate cold data (old media, archives) 4. Update service volume mounts

2. Storage Tiering Strategy¶

Tier 1 (SSD/Fast Storage): - Databases (MariaDB, PostgreSQL, InfluxDB) - Prometheus TSDB - Plex metadata - Photo library metadata

Tier 2 (HDD/Standard Storage): - Media files (movies, TV, music) - Photo originals - Camera recordings - Backups

Tier 3 (Cold Storage/Archive): - Old media not accessed in 6+ months - Historical logs/metrics - Old photo archives - Database backups >30 days old

3. Automated Cleanup Policies¶

Implement lifecycle policies:

# Cron job for old file cleanup
# /etc/cron.daily/storage-cleanup.sh

#!/bin/bash
# Delete old logs
find /nfs_data/*/logs -name "*.log" -mtime +30 -delete

# Delete old Prometheus snapshots
find /nfs_data/prometheus -name "*.snapshot" -mtime +7 -delete

# Clean Docker system weekly
docker system prune -af --filter "until=168h"

# Alert if storage >85%
USAGE=$(df /nfs_data | tail -1 | awk '{print $5}' | sed 's/%//')
if [ $USAGE -gt 85 ]; then
  echo "Storage at ${USAGE}% - cleanup required" | mail -s "Storage Alert" admin@domain.com
fi

Long-Term Solutions (3-12 Months)¶

1. Offload to Cloud Storage¶

Services to offload: - Old media to S3/Glacier - Historical metrics to cloud TSDB - Photo archives to cloud storage - Long-term backups

Implementation: - Configure Duplicati for cloud backups - Use rclone for media archival - Lifecycle policies for automatic migration

2. Implement Dedicated Storage Nodes¶

Architecture change: - Add dedicated storage VMs/nodes - Separate hot/cold storage tiers - NVMe for databases, HDD for media - Geographic replication

3. Containerized Storage Solutions¶

Consider: - Ceph for distributed storage - GlusterFS for scalability - MinIO for S3-compatible object storage - Longhorn for Kubernetes-style volumes

Monitoring & Alerts¶

Prometheus Alerts¶

Create alert rules:

groups:
  - name: storage_alerts
    rules:
      - alert: StorageCritical
        expr: node_filesystem_avail_bytes{mountpoint=~"/nfs_.*"} / node_filesystem_size_bytes < 0.10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Storage critically low on {{ $labels.instance }}"

      - alert: StorageWarning
        expr: node_filesystem_avail_bytes{mountpoint=~"/nfs_.*"} / node_filesystem_size_bytes < 0.15
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Storage warning on {{ $labels.instance }}"

Grafana Dashboard¶

Create storage dashboard with panels for: - Current capacity by mount - Growth rate trends - Time-to-full projections - Per-service storage consumption - I/O utilization

Daily Checks¶

# Add to cron: 0 8 * * * /usr/local/bin/storage-check.sh

#!/bin/bash
echo "=== Storage Status $(date) ===" >> /var/log/storage-status.log
df -h | grep nfs >> /var/log/storage-status.log

# Alert if >90%
df -h | grep nfs | awk '{if (int($5) > 90) print $0}' | \
  mail -s "Storage >90%" admin@domain.com

Capacity Planning¶

Current Growth Rate¶

Estimate growth based on usage:

# Compare storage usage over time
# (requires historical data or manual tracking)

# Week 1: 2.7TB used
# Week 2: 2.75TB used  (+50GB/week)
# Week 3: 2.8TB used   (+50GB/week)

# Projected:
# Week 4: 2.85TB (250GB free)
# Week 5: 2.90TB (200GB free)  # Critical threshold
# Week 6: 2.95TB (150GB free)  # URGENT
# Week 7: 3.0TB  (0GB free)    # FULL

Target Utilization¶

Recommended thresholds: - Green: <75% - Normal operations - Yellow: 75-85% - Plan expansion - Orange: 85-90% - Active cleanup - Red: 90-95% - Critical, immediate action - Critical: >95% - Emergency procedures

Target state: <70% utilization with headroom for growth

Decision Matrix¶

Available Space	Action Required	Timeline
<100GB	🚨 Emergency cleanup + halt non-critical writes	Immediate
100-200GB	🔴 Aggressive cleanup + expansion planning	24-48 hours
200-400GB	🟠 Moderate cleanup + purchase storage hardware	1-2 weeks
400-600GB	🟡 Review and optimize + schedule expansion	1 month
>600GB	🟢 Normal monitoring + capacity planning	Ongoing

Current state: 254GB remaining = 🔴 Aggressive action required

[[NFS-Architecture|NFS Storage Architecture]]
[[Volume-Management|Docker Volume Management]]
[[03-Operations/Backup-Procedures|Backup Procedures]]
[[06-Troubleshooting/Known-Issues|Known Issues]]

Action Items Checklist¶

Immediate (Today)¶

Run du -h --max-depth=1 /nfs_data | sort -hr to identify space hogs
Execute docker system prune -af --volumes on all nodes
Run ./sh-delete-temp.sh for temp file cleanup
Clear Plex cache: rm -rf /nfs_data/plex/.../Cache/*
Review Transmission downloads folder

Short-Term (This Week)¶

Archive old media files (>6 months unwatched)
Reduce Prometheus retention to 7 days
Optimize databases (VACUUM, OPTIMIZE TABLE)
Delete old log files (>30 days)
Review and consolidate photo services

Medium-Term (This Month)¶

Purchase additional storage hardware
Plan storage expansion implementation
Implement automated cleanup policies
Create Prometheus storage alerts
Build Grafana storage dashboard

Long-Term (Next Quarter)¶

Implement storage tiering strategy
Configure cloud storage offload
Evaluate distributed storage solutions
Document storage capacity planning process

Last Updated: 2026-01-11 Status: 🔴 CRITICAL - 254GB remaining (8%) Next Review: Daily until <85% utilization Action Required: Immediate cleanup and expansion planning