add shares av system

2026-01-04 14:11:38 +07:00
parent 70d25e13b8
commit 0c8a9efecc
49 changed files with 405 additions and 1 deletions
--- a/docs/on-progress/healthcheck-script.md
+++ b/docs/on-progress/healthcheck-script.md
@@ -0,0 +1,344 @@
+# Calypso Appliance Health Check Script
+
+## Overview
+Comprehensive health check script for all Calypso Appliance components. Performs automated checks across system resources, services, network, storage, and backup infrastructure.
+
+## Installation
+Script location: `/usr/local/bin/calypso-healthcheck`
+
+## Usage
+
+### Basic Usage
+```bash
+# Run health check (requires root)
+calypso-healthcheck
+
+# Run and save to specific location
+calypso-healthcheck 2>&1 | tee /root/healthcheck-$(date +%Y%m%d).log
+```
+
+### Exit Codes
+- `0` - All checks passed (100% healthy)
+- `1` - Healthy with warnings (some non-critical issues)
+- `2` - Degraded (80%+ checks passed, some failures)
+- `3` - Critical (less than 80% checks passed)
+
+### Automated Checks
+
+#### System Resources (4 checks)
+- Root filesystem usage (threshold: 80%)
+- /var filesystem usage (threshold: 80%)
+- Memory usage (threshold: 90%)
+- CPU load average
+
+#### Database Services (2 checks)
+- PostgreSQL service status
+- Database presence (calypso, bacula)
+
+#### Calypso Application (7 checks)
+- calypso-api service
+- calypso-frontend service
+- calypso-logger service
+- API port 8443
+- Frontend port 3000
+- API health endpoint
+- Frontend health endpoint
+
+#### Backup Services - Bacula (8 checks)
+- bacula-director service
+- bacula-fd service
+- bacula-sd service
+- Director bconsole connectivity
+- Storage (Scalar-i500) accessibility
+- Director port 9101
+- FD port 9102
+- SD port 9103
+
+#### Virtual Tape Library - mhVTL (4 checks)
+- mhvtl.target status
+- vtllibrary@10 (Scalar i500)
+- vtllibrary@30 (Scalar i40)
+- VTL device count (2 changers, 8 tape drives)
+- Scalar i500 slots detection
+
+#### Storage Protocols (9 checks)
+- NFS server service
+- Samba (smbd) service
+- NetBIOS (nmbd) service
+- SCST service
+- iSCSI target service
+- NFS port 2049
+- SMB port 445
+- NetBIOS port 139
+- iSCSI port 3260
+
+#### Monitoring & Management (2 checks)
+- SNMP daemon
+- SNMP port 161
+
+#### Network Connectivity (2 checks)
+- Internet connectivity (ping 8.8.8.8)
+- Network manager status
+
+**Total: 39+ automated checks**
+
+## Output Format
+
+### Console Output
+- Color-coded status indicators:
+  - ✓ Green = Passed
+  - ⚠ Yellow = Warning
+  - ✗ Red = Failed
+
+### Example Output
+```
+==========================================
+  CALYPSO APPLIANCE HEALTH CHECK
+==========================================
+Date: 2025-12-31 01:46:27
+Hostname: calypso
+Uptime: up 6 days, 2 hours, 50 minutes
+Log file: /var/log/calypso-healthcheck-20251231-014627.log
+
+========================================
+SYSTEM RESOURCES
+========================================
+✓ Root filesystem (18% used)
+✓ Var filesystem (18% used)
+✓ Memory usage (49% used, 8206MB available)
+✓ CPU load average (2.18, 8 cores)
+
+...
+
+========================================
+HEALTH CHECK SUMMARY
+========================================
+
+Total Checks:    39
+Passed:          35
+Warnings:        0
+Failed:          4
+
+⚠ OVERALL STATUS: DEGRADED (89%)
+```
+
+### Log Files
+All checks are logged to: `/var/log/calypso-healthcheck-YYYYMMDD-HHMMSS.log`
+
+Logs include:
+- Timestamp and system information
+- Detailed check results
+- Summary statistics
+- Overall health status
+
+## Scheduling
+
+### Manual Execution
+```bash
+# Run on demand
+sudo calypso-healthcheck
+```
+
+### Cron Job (Recommended)
+Add to crontab for automated checks:
+
+```bash
+# Daily health check at 2 AM
+0 2 * * * /usr/local/bin/calypso-healthcheck > /dev/null 2>&1
+
+# Weekly health check on Monday at 6 AM with email notification
+0 6 * * 1 /usr/local/bin/calypso-healthcheck 2>&1 | mail -s "Calypso Health Check" admin@example.com
+```
+
+### Systemd Timer (Alternative)
+Create `/etc/systemd/system/calypso-healthcheck.timer`:
+```ini
+[Unit]
+Description=Daily Calypso Health Check
+Requires=calypso-healthcheck.service
+
+[Timer]
+OnCalendar=daily
+Persistent=true
+
+[Install]
+WantedBy=timers.target
+```
+
+Create `/etc/systemd/system/calypso-healthcheck.service`:
+```ini
+[Unit]
+Description=Calypso Appliance Health Check
+
+[Service]
+Type=oneshot
+ExecStart=/usr/local/bin/calypso-healthcheck
+```
+
+Enable:
+```bash
+systemctl enable --now calypso-healthcheck.timer
+```
+
+## Troubleshooting
+
+### Common Failures
+
+#### API/Frontend Health Endpoints Failing
+```bash
+# Check if services are running
+systemctl status calypso-api calypso-frontend
+
+# Check service logs
+journalctl -u calypso-api -n 50
+journalctl -u calypso-frontend -n 50
+
+# Test manually
+curl -k https://localhost:8443/health
+curl -k https://localhost:3000/health
+```
+
+#### Bacula Director Not Responding
+```bash
+# Check service
+systemctl status bacula-director
+
+# Test bconsole
+echo "status director" | bconsole
+
+# Check logs
+tail -50 /var/log/bacula/bacula.log
+```
+
+#### VTL Slots Not Detected
+```bash
+# Check VTL services
+systemctl status mhvtl.target
+
+# Check devices
+lsscsi | grep -E "mediumx|tape"
+
+# Test manually
+mtx -f /dev/sg7 status
+echo "update slots storage=Scalar-i500" | bconsole
+```
+
+#### Storage Protocols Port Not Listening
+```bash
+# Check service status
+systemctl status nfs-server smbd nmbd scst iscsi-scstd
+
+# Check listening ports
+ss -tuln | grep -E "2049|445|139|3260"
+
+# Restart services if needed
+systemctl restart nfs-server
+systemctl restart smbd nmbd
+```
+
+## Customization
+
+### Modify Thresholds
+Edit `/usr/local/bin/calypso-healthcheck`:
+
+```bash
+# Disk usage threshold (default: 80%)
+check_disk "/" 80 "Root filesystem"
+
+# Memory usage threshold (default: 90%)
+if [ "$mem_percent" -lt 90 ]; then
+
+# Change expected VTL devices
+if [ "$changer_count" -ge 2 ] && [ "$tape_count" -ge 8 ]; then
+```
+
+### Add Custom Checks
+Add new check functions:
+
+```bash
+check_custom() {
+    TOTAL_CHECKS=$((TOTAL_CHECKS + 1))
+    
+    if [[ condition ]]; then
+        echo -e "${GREEN}${CHECK}${NC} Custom check passed" | tee -a "$LOG_FILE"
+        PASSED_CHECKS=$((PASSED_CHECKS + 1))
+    else
+        echo -e "${RED}${CROSS}${NC} Custom check failed" | tee -a "$LOG_FILE"
+        FAILED_CHECKS=$((FAILED_CHECKS + 1))
+    fi
+}
+
+# Call in main script
+check_custom
+```
+
+## Integration
+
+### Monitoring Systems
+Export metrics for monitoring:
+
+```bash
+# Nagios/Icinga format
+calypso-healthcheck
+if [ $? -eq 0 ]; then
+    echo "OK - All checks passed"
+    exit 0
+elif [ $? -eq 1 ]; then
+    echo "WARNING - Healthy with warnings"
+    exit 1
+else
+    echo "CRITICAL - System degraded"
+    exit 2
+fi
+```
+
+### API Integration
+Parse JSON output:
+
+```bash
+# Add JSON output option
+calypso-healthcheck --json > /tmp/health.json
+```
+
+## Maintenance
+
+### Log Rotation
+Logs are stored in `/var/log/calypso-healthcheck-*.log`
+
+Create `/etc/logrotate.d/calypso-healthcheck`:
+```
+/var/log/calypso-healthcheck-*.log {
+    weekly
+    rotate 12
+    compress
+    delaycompress
+    missingok
+    notifempty
+}
+```
+
+### Cleanup Old Logs
+```bash
+# Remove logs older than 30 days
+find /var/log -name "calypso-healthcheck-*.log" -mtime +30 -delete
+```
+
+## Best Practices
+
+1. **Run after reboot** - Verify all services started correctly
+2. **Schedule regular checks** - Daily or weekly automated runs
+3. **Monitor exit codes** - Alert on degraded/critical status
+4. **Review logs periodically** - Identify patterns or recurring issues
+5. **Update checks** - Add new components as system evolves
+6. **Baseline health** - Establish normal operating parameters
+7. **Document exceptions** - Note known warnings that are acceptable
+
+## See Also
+- `pre-reboot-checklist.md` - Pre-reboot verification
+- `bacula-vtl-troubleshooting.md` - VTL troubleshooting guide
+- System logs: `/var/log/syslog`, `/var/log/bacula/`
+
+---
+
+*Created: 2025-12-31*  
+*Script: `/usr/local/bin/calypso-healthcheck`*