# Calypso Appliance Health Check Script ## Overview Comprehensive health check script for all Calypso Appliance components. Performs automated checks across system resources, services, network, storage, and backup infrastructure. ## Installation Script location: `/usr/local/bin/calypso-healthcheck` ## Usage ### Basic Usage ```bash # Run health check (requires root) calypso-healthcheck # Run and save to specific location calypso-healthcheck 2>&1 | tee /root/healthcheck-$(date +%Y%m%d).log ``` ### Exit Codes - `0` - All checks passed (100% healthy) - `1` - Healthy with warnings (some non-critical issues) - `2` - Degraded (80%+ checks passed, some failures) - `3` - Critical (less than 80% checks passed) ### Automated Checks #### System Resources (4 checks) - Root filesystem usage (threshold: 80%) - /var filesystem usage (threshold: 80%) - Memory usage (threshold: 90%) - CPU load average #### Database Services (2 checks) - PostgreSQL service status - Database presence (calypso, bacula) #### Calypso Application (7 checks) - calypso-api service - calypso-frontend service - calypso-logger service - API port 8443 - Frontend port 3000 - API health endpoint - Frontend health endpoint #### Backup Services - Bacula (8 checks) - bacula-director service - bacula-fd service - bacula-sd service - Director bconsole connectivity - Storage (Scalar-i500) accessibility - Director port 9101 - FD port 9102 - SD port 9103 #### Virtual Tape Library - mhVTL (4 checks) - mhvtl.target status - vtllibrary@10 (Scalar i500) - vtllibrary@30 (Scalar i40) - VTL device count (2 changers, 8 tape drives) - Scalar i500 slots detection #### Storage Protocols (9 checks) - NFS server service - Samba (smbd) service - NetBIOS (nmbd) service - SCST service - iSCSI target service - NFS port 2049 - SMB port 445 - NetBIOS port 139 - iSCSI port 3260 #### Monitoring & Management (2 checks) - SNMP daemon - SNMP port 161 #### Network Connectivity (2 checks) - Internet connectivity (ping 8.8.8.8) - Network manager status **Total: 39+ automated checks** ## Output Format ### Console Output - Color-coded status indicators: - ✓ Green = Passed - ⚠ Yellow = Warning - ✗ Red = Failed ### Example Output ``` ========================================== CALYPSO APPLIANCE HEALTH CHECK ========================================== Date: 2025-12-31 01:46:27 Hostname: calypso Uptime: up 6 days, 2 hours, 50 minutes Log file: /var/log/calypso-healthcheck-20251231-014627.log ======================================== SYSTEM RESOURCES ======================================== ✓ Root filesystem (18% used) ✓ Var filesystem (18% used) ✓ Memory usage (49% used, 8206MB available) ✓ CPU load average (2.18, 8 cores) ... ======================================== HEALTH CHECK SUMMARY ======================================== Total Checks: 39 Passed: 35 Warnings: 0 Failed: 4 ⚠ OVERALL STATUS: DEGRADED (89%) ``` ### Log Files All checks are logged to: `/var/log/calypso-healthcheck-YYYYMMDD-HHMMSS.log` Logs include: - Timestamp and system information - Detailed check results - Summary statistics - Overall health status ## Scheduling ### Manual Execution ```bash # Run on demand sudo calypso-healthcheck ``` ### Cron Job (Recommended) Add to crontab for automated checks: ```bash # Daily health check at 2 AM 0 2 * * * /usr/local/bin/calypso-healthcheck > /dev/null 2>&1 # Weekly health check on Monday at 6 AM with email notification 0 6 * * 1 /usr/local/bin/calypso-healthcheck 2>&1 | mail -s "Calypso Health Check" admin@example.com ``` ### Systemd Timer (Alternative) Create `/etc/systemd/system/calypso-healthcheck.timer`: ```ini [Unit] Description=Daily Calypso Health Check Requires=calypso-healthcheck.service [Timer] OnCalendar=daily Persistent=true [Install] WantedBy=timers.target ``` Create `/etc/systemd/system/calypso-healthcheck.service`: ```ini [Unit] Description=Calypso Appliance Health Check [Service] Type=oneshot ExecStart=/usr/local/bin/calypso-healthcheck ``` Enable: ```bash systemctl enable --now calypso-healthcheck.timer ``` ## Troubleshooting ### Common Failures #### API/Frontend Health Endpoints Failing ```bash # Check if services are running systemctl status calypso-api calypso-frontend # Check service logs journalctl -u calypso-api -n 50 journalctl -u calypso-frontend -n 50 # Test manually curl -k https://localhost:8443/health curl -k https://localhost:3000/health ``` #### Bacula Director Not Responding ```bash # Check service systemctl status bacula-director # Test bconsole echo "status director" | bconsole # Check logs tail -50 /var/log/bacula/bacula.log ``` #### VTL Slots Not Detected ```bash # Check VTL services systemctl status mhvtl.target # Check devices lsscsi | grep -E "mediumx|tape" # Test manually mtx -f /dev/sg7 status echo "update slots storage=Scalar-i500" | bconsole ``` #### Storage Protocols Port Not Listening ```bash # Check service status systemctl status nfs-server smbd nmbd scst iscsi-scstd # Check listening ports ss -tuln | grep -E "2049|445|139|3260" # Restart services if needed systemctl restart nfs-server systemctl restart smbd nmbd ``` ## Customization ### Modify Thresholds Edit `/usr/local/bin/calypso-healthcheck`: ```bash # Disk usage threshold (default: 80%) check_disk "/" 80 "Root filesystem" # Memory usage threshold (default: 90%) if [ "$mem_percent" -lt 90 ]; then # Change expected VTL devices if [ "$changer_count" -ge 2 ] && [ "$tape_count" -ge 8 ]; then ``` ### Add Custom Checks Add new check functions: ```bash check_custom() { TOTAL_CHECKS=$((TOTAL_CHECKS + 1)) if [[ condition ]]; then echo -e "${GREEN}${CHECK}${NC} Custom check passed" | tee -a "$LOG_FILE" PASSED_CHECKS=$((PASSED_CHECKS + 1)) else echo -e "${RED}${CROSS}${NC} Custom check failed" | tee -a "$LOG_FILE" FAILED_CHECKS=$((FAILED_CHECKS + 1)) fi } # Call in main script check_custom ``` ## Integration ### Monitoring Systems Export metrics for monitoring: ```bash # Nagios/Icinga format calypso-healthcheck if [ $? -eq 0 ]; then echo "OK - All checks passed" exit 0 elif [ $? -eq 1 ]; then echo "WARNING - Healthy with warnings" exit 1 else echo "CRITICAL - System degraded" exit 2 fi ``` ### API Integration Parse JSON output: ```bash # Add JSON output option calypso-healthcheck --json > /tmp/health.json ``` ## Maintenance ### Log Rotation Logs are stored in `/var/log/calypso-healthcheck-*.log` Create `/etc/logrotate.d/calypso-healthcheck`: ``` /var/log/calypso-healthcheck-*.log { weekly rotate 12 compress delaycompress missingok notifempty } ``` ### Cleanup Old Logs ```bash # Remove logs older than 30 days find /var/log -name "calypso-healthcheck-*.log" -mtime +30 -delete ``` ## Best Practices 1. **Run after reboot** - Verify all services started correctly 2. **Schedule regular checks** - Daily or weekly automated runs 3. **Monitor exit codes** - Alert on degraded/critical status 4. **Review logs periodically** - Identify patterns or recurring issues 5. **Update checks** - Add new components as system evolves 6. **Baseline health** - Establish normal operating parameters 7. **Document exceptions** - Note known warnings that are acceptable ## See Also - `pre-reboot-checklist.md` - Pre-reboot verification - `bacula-vtl-troubleshooting.md` - VTL troubleshooting guide - System logs: `/var/log/syslog`, `/var/log/bacula/` --- *Created: 2025-12-31* *Script: `/usr/local/bin/calypso-healthcheck`*