Storage Critical Warning - 92% Capacity¶
[!danger] Critical Storage Alert Status: 🔴 CRITICAL Severity: P0 - Immediate attention required Impact: Risk of service failures, database corruption, data loss Available Space: 254GB of 3TB (8% remaining)
Current Status¶
Discovery Date: 2026-01-11 Current Utilization: 92% across primary NFS mounts Time to Critical (95%): ~2-4 weeks at current growth rate Time to Full (100%): ~2-10 weeks depending on usage
Affected Mounts¶
| Mount Point | Total | Used | Available | Use% | Risk Level |
|---|---|---|---|---|---|
/nfs_data |
3.0TB | 2.8TB | 254GB | 92% | 🔴 Critical |
/nfs_media |
3.0TB | 2.8TB | 254GB | 92% | 🔴 Critical |
/nfs_media_lib |
3.0TB | 2.8TB | 254GB | 92% | 🔴 Critical |
/nfs_service |
3.0TB | 2.8TB | 254GB | 92% | 🔴 Critical |
/nfs_personal |
503GB | 379GB | 124GB | 76% | 🟡 Warning |
/nfs_cams |
69GB | 27GB | 42GB | 40% | 🟢 Healthy |
[!warning] Shared Storage Pool The four critical mounts (
/nfs_data,/nfs_media,/nfs_media_lib,/nfs_service) share the same underlying 3TB volume. Effective available space is ~254GB total, not 254GB per mount.
Impact Analysis¶
Immediate Risks¶
When storage reaches 95%: - ⚠️ Database write failures (MariaDB, PostgreSQL, InfluxDB) - ⚠️ Service crash loops due to failed writes - ⚠️ Docker volume creation failures - ⚠️ Log rotation failures - ⚠️ Photo upload rejections - ⚠️ Media download failures
When storage reaches 98%: - 🔴 Plex metadata corruption - 🔴 Database corruption risk - 🔴 Service data loss - 🔴 Stack deployment failures - 🔴 Container OOM kills
When storage reaches 100%: - 💀 Complete service failure - 💀 Data corruption across services - 💀 Potential permanent data loss - 💀 Extended recovery time
Affected Services (by priority)¶
P0 - Critical Data Services: - MariaDB databases (CRM, PhotoPrism, general) - PostgreSQL databases (Immich, Paperless, RxResume, LibrePhotos) - InfluxDB time-series data - Elasticsearch indices (when restored) - Redis persistence
P1 - High Priority Services: - Plex metadata and cache - Prometheus TSDB - Grafana dashboards - Photo libraries (Immich, PhotoPrism, LibrePhotos) - Document storage (Paperless)
P2 - Medium Priority: - Media downloads (Transmission, ARR) - Camera recordings (iSpy) - Service logs - Configuration backups
Immediate Actions (Next 24-48 Hours)¶
1. Identify Space Hogs¶
# Find largest directories in /nfs_data
du -h --max-depth=1 /nfs_data | sort -hr | head -20
# Find largest files
find /nfs_data -type f -size +1G -exec ls -lh {} \; | sort -k5 -hr
# Check specific service directories
du -sh /nfs_data/plex/*
du -sh /nfs_data/prometheus/*
du -sh /nfs_data/mariadb/*
2. Quick Cleanup Opportunities¶
Low-hanging fruit (safe to delete):
# Clean Docker system (images, containers, volumes)
docker system prune -af --volumes
# Delete temporary files
./sh-delete-temp.sh
# Plex cache cleanup
rm -rf /nfs_data/plex/Library/Application\ Support/Plex\ Media\ Server/Cache/*
# Old log files
find /nfs_data -name "*.log" -mtime +30 -delete
find /nfs_data -name "*.log.*" -mtime +30 -delete
# Prometheus old data (adjust retention)
# Edit Prometheus config to reduce retention from 15d to 7d
# Transmission completed downloads (if already imported)
# Review /nfs_data/transmission/downloads/complete/
3. Stop Non-Critical Services¶
Free up write operations and potential space:
# Game servers (if not in use)
docker service scale homenet4_palworld=0
docker service scale homenet4_satisfactory=0
# Backup service (already offline)
# docker service scale backup_duplicati=0
# Consider temporarily scaling down photo services if not actively used
# docker service scale librephotos_backend=0
# docker service scale albumviewer_backend=0
Short-Term Solutions (Next 1-2 Weeks)¶
1. Media Library Cleanup¶
Review and archive: - Old/unwatched movies and TV shows - Duplicate media files - Low-quality media (upgrade to remux later) - Temporary downloads folder
# Find duplicate files
fdupes -r /nfs_media/Movies/
fdupes -r /nfs_media/TV\ Shows/
# Find old files not accessed in 6 months
find /nfs_media -type f -atime +180 -exec ls -lh {} \;
2. Photo Library Optimization¶
Compress and deduplicate: - Remove duplicate photos - Compress original photos (lossless) - Delete thumbnails/cache (regenerate as needed)
# Check photo library sizes
du -sh /nfs_data/immich/*
du -sh /nfs_data/photoprism/*
du -sh /nfs_data/librephotos/*
# Consider consolidating to single photo service
3. Database Optimization¶
Cleanup old data:
# InfluxDB - compact and downsample old data
# Access InfluxDB UI: http://100.1.100.201:8086
# MariaDB - optimize tables
docker exec -it $(docker ps -qf "name=mariadb") mysql -p
> OPTIMIZE TABLE <table_name>;
# PostgreSQL - vacuum
docker exec -it $(docker ps -qf "name=postgres") psql -U <user>
> VACUUM FULL;
# Elasticsearch - delete old indices (when restored)
curl -X DELETE http://100.1.100.201:9200/logstash-*-older-than-30-days
4. Prometheus Retention¶
Reduce retention period:
Edit Prometheus config:
Redeploy:
Medium-Term Solutions (Next 1-3 Months)¶
1. Storage Expansion¶
Option A: Expand OMV Storage Pool
Pros: - No configuration changes needed - Transparent to services - Quick implementation
Cons: - Requires hardware purchase - May have physical limits
Implementation:
1. Add physical disks to OMV server
2. Extend existing storage pool
3. Resize filesystem
4. Verify expansion: df -h
Option B: Add New Storage Pool
Pros: - Separate hot/cold data - Better performance tuning - Storage tiering
Cons: - Requires service reconfiguration - Data migration effort - More complex management
Implementation:
1. Create new NFS export on OMV
2. Mount new export on Docker nodes: /nfs_cold_storage
3. Migrate cold data (old media, archives)
4. Update service volume mounts
2. Storage Tiering Strategy¶
Tier 1 (SSD/Fast Storage): - Databases (MariaDB, PostgreSQL, InfluxDB) - Prometheus TSDB - Plex metadata - Photo library metadata
Tier 2 (HDD/Standard Storage): - Media files (movies, TV, music) - Photo originals - Camera recordings - Backups
Tier 3 (Cold Storage/Archive): - Old media not accessed in 6+ months - Historical logs/metrics - Old photo archives - Database backups >30 days old
3. Automated Cleanup Policies¶
Implement lifecycle policies:
# Cron job for old file cleanup
# /etc/cron.daily/storage-cleanup.sh
#!/bin/bash
# Delete old logs
find /nfs_data/*/logs -name "*.log" -mtime +30 -delete
# Delete old Prometheus snapshots
find /nfs_data/prometheus -name "*.snapshot" -mtime +7 -delete
# Clean Docker system weekly
docker system prune -af --filter "until=168h"
# Alert if storage >85%
USAGE=$(df /nfs_data | tail -1 | awk '{print $5}' | sed 's/%//')
if [ $USAGE -gt 85 ]; then
echo "Storage at ${USAGE}% - cleanup required" | mail -s "Storage Alert" admin@domain.com
fi
Long-Term Solutions (3-12 Months)¶
1. Offload to Cloud Storage¶
Services to offload: - Old media to S3/Glacier - Historical metrics to cloud TSDB - Photo archives to cloud storage - Long-term backups
Implementation: - Configure Duplicati for cloud backups - Use rclone for media archival - Lifecycle policies for automatic migration
2. Implement Dedicated Storage Nodes¶
Architecture change: - Add dedicated storage VMs/nodes - Separate hot/cold storage tiers - NVMe for databases, HDD for media - Geographic replication
3. Containerized Storage Solutions¶
Consider: - Ceph for distributed storage - GlusterFS for scalability - MinIO for S3-compatible object storage - Longhorn for Kubernetes-style volumes
Monitoring & Alerts¶
Prometheus Alerts¶
Create alert rules:
groups:
- name: storage_alerts
rules:
- alert: StorageCritical
expr: node_filesystem_avail_bytes{mountpoint=~"/nfs_.*"} / node_filesystem_size_bytes < 0.10
for: 5m
labels:
severity: critical
annotations:
summary: "Storage critically low on {{ $labels.instance }}"
- alert: StorageWarning
expr: node_filesystem_avail_bytes{mountpoint=~"/nfs_.*"} / node_filesystem_size_bytes < 0.15
for: 15m
labels:
severity: warning
annotations:
summary: "Storage warning on {{ $labels.instance }}"
Grafana Dashboard¶
Create storage dashboard with panels for: - Current capacity by mount - Growth rate trends - Time-to-full projections - Per-service storage consumption - I/O utilization
Daily Checks¶
# Add to cron: 0 8 * * * /usr/local/bin/storage-check.sh
#!/bin/bash
echo "=== Storage Status $(date) ===" >> /var/log/storage-status.log
df -h | grep nfs >> /var/log/storage-status.log
# Alert if >90%
df -h | grep nfs | awk '{if (int($5) > 90) print $0}' | \
mail -s "Storage >90%" admin@domain.com
Capacity Planning¶
Current Growth Rate¶
Estimate growth based on usage:
# Compare storage usage over time
# (requires historical data or manual tracking)
# Week 1: 2.7TB used
# Week 2: 2.75TB used (+50GB/week)
# Week 3: 2.8TB used (+50GB/week)
# Projected:
# Week 4: 2.85TB (250GB free)
# Week 5: 2.90TB (200GB free) # Critical threshold
# Week 6: 2.95TB (150GB free) # URGENT
# Week 7: 3.0TB (0GB free) # FULL
Target Utilization¶
Recommended thresholds: - Green: <75% - Normal operations - Yellow: 75-85% - Plan expansion - Orange: 85-90% - Active cleanup - Red: 90-95% - Critical, immediate action - Critical: >95% - Emergency procedures
Target state: <70% utilization with headroom for growth
Decision Matrix¶
| Available Space | Action Required | Timeline |
|---|---|---|
| <100GB | 🚨 Emergency cleanup + halt non-critical writes | Immediate |
| 100-200GB | 🔴 Aggressive cleanup + expansion planning | 24-48 hours |
| 200-400GB | 🟠 Moderate cleanup + purchase storage hardware | 1-2 weeks |
| 400-600GB | 🟡 Review and optimize + schedule expansion | 1 month |
| >600GB | 🟢 Normal monitoring + capacity planning | Ongoing |
Current state: 254GB remaining = 🔴 Aggressive action required
Related Documentation¶
- [[NFS-Architecture|NFS Storage Architecture]]
- [[Volume-Management|Docker Volume Management]]
- [[03-Operations/Backup-Procedures|Backup Procedures]]
- [[06-Troubleshooting/Known-Issues|Known Issues]]
Action Items Checklist¶
Immediate (Today)¶
- Run
du -h --max-depth=1 /nfs_data | sort -hrto identify space hogs - Execute
docker system prune -af --volumeson all nodes - Run
./sh-delete-temp.shfor temp file cleanup - Clear Plex cache:
rm -rf /nfs_data/plex/.../Cache/* - Review Transmission downloads folder
Short-Term (This Week)¶
- Archive old media files (>6 months unwatched)
- Reduce Prometheus retention to 7 days
- Optimize databases (VACUUM, OPTIMIZE TABLE)
- Delete old log files (>30 days)
- Review and consolidate photo services
Medium-Term (This Month)¶
- Purchase additional storage hardware
- Plan storage expansion implementation
- Implement automated cleanup policies
- Create Prometheus storage alerts
- Build Grafana storage dashboard
Long-Term (Next Quarter)¶
- Implement storage tiering strategy
- Configure cloud storage offload
- Evaluate distributed storage solutions
- Document storage capacity planning process
Last Updated: 2026-01-11 Status: 🔴 CRITICAL - 254GB remaining (8%) Next Review: Daily until <85% utilization Action Required: Immediate cleanup and expansion planning