7.4 KiB
Calypso Appliance Health Check Script
Overview
Comprehensive health check script for all Calypso Appliance components. Performs automated checks across system resources, services, network, storage, and backup infrastructure.
Installation
Script location: /usr/local/bin/calypso-healthcheck
Usage
Basic Usage
# Run health check (requires root)
calypso-healthcheck
# Run and save to specific location
calypso-healthcheck 2>&1 | tee /root/healthcheck-$(date +%Y%m%d).log
Exit Codes
0- All checks passed (100% healthy)1- Healthy with warnings (some non-critical issues)2- Degraded (80%+ checks passed, some failures)3- Critical (less than 80% checks passed)
Automated Checks
System Resources (4 checks)
- Root filesystem usage (threshold: 80%)
- /var filesystem usage (threshold: 80%)
- Memory usage (threshold: 90%)
- CPU load average
Database Services (2 checks)
- PostgreSQL service status
- Database presence (calypso, bacula)
Calypso Application (7 checks)
- calypso-api service
- calypso-frontend service
- calypso-logger service
- API port 8443
- Frontend port 3000
- API health endpoint
- Frontend health endpoint
Backup Services - Bacula (8 checks)
- bacula-director service
- bacula-fd service
- bacula-sd service
- Director bconsole connectivity
- Storage (Scalar-i500) accessibility
- Director port 9101
- FD port 9102
- SD port 9103
Virtual Tape Library - mhVTL (4 checks)
- mhvtl.target status
- vtllibrary@10 (Scalar i500)
- vtllibrary@30 (Scalar i40)
- VTL device count (2 changers, 8 tape drives)
- Scalar i500 slots detection
Storage Protocols (9 checks)
- NFS server service
- Samba (smbd) service
- NetBIOS (nmbd) service
- SCST service
- iSCSI target service
- NFS port 2049
- SMB port 445
- NetBIOS port 139
- iSCSI port 3260
Monitoring & Management (2 checks)
- SNMP daemon
- SNMP port 161
Network Connectivity (2 checks)
- Internet connectivity (ping 8.8.8.8)
- Network manager status
Total: 39+ automated checks
Output Format
Console Output
- Color-coded status indicators:
- ✓ Green = Passed
- ⚠ Yellow = Warning
- ✗ Red = Failed
Example Output
==========================================
CALYPSO APPLIANCE HEALTH CHECK
==========================================
Date: 2025-12-31 01:46:27
Hostname: calypso
Uptime: up 6 days, 2 hours, 50 minutes
Log file: /var/log/calypso-healthcheck-20251231-014627.log
========================================
SYSTEM RESOURCES
========================================
✓ Root filesystem (18% used)
✓ Var filesystem (18% used)
✓ Memory usage (49% used, 8206MB available)
✓ CPU load average (2.18, 8 cores)
...
========================================
HEALTH CHECK SUMMARY
========================================
Total Checks: 39
Passed: 35
Warnings: 0
Failed: 4
⚠ OVERALL STATUS: DEGRADED (89%)
Log Files
All checks are logged to: /var/log/calypso-healthcheck-YYYYMMDD-HHMMSS.log
Logs include:
- Timestamp and system information
- Detailed check results
- Summary statistics
- Overall health status
Scheduling
Manual Execution
# Run on demand
sudo calypso-healthcheck
Cron Job (Recommended)
Add to crontab for automated checks:
# Daily health check at 2 AM
0 2 * * * /usr/local/bin/calypso-healthcheck > /dev/null 2>&1
# Weekly health check on Monday at 6 AM with email notification
0 6 * * 1 /usr/local/bin/calypso-healthcheck 2>&1 | mail -s "Calypso Health Check" admin@example.com
Systemd Timer (Alternative)
Create /etc/systemd/system/calypso-healthcheck.timer:
[Unit]
Description=Daily Calypso Health Check
Requires=calypso-healthcheck.service
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.target
Create /etc/systemd/system/calypso-healthcheck.service:
[Unit]
Description=Calypso Appliance Health Check
[Service]
Type=oneshot
ExecStart=/usr/local/bin/calypso-healthcheck
Enable:
systemctl enable --now calypso-healthcheck.timer
Troubleshooting
Common Failures
API/Frontend Health Endpoints Failing
# Check if services are running
systemctl status calypso-api calypso-frontend
# Check service logs
journalctl -u calypso-api -n 50
journalctl -u calypso-frontend -n 50
# Test manually
curl -k https://localhost:8443/health
curl -k https://localhost:3000/health
Bacula Director Not Responding
# Check service
systemctl status bacula-director
# Test bconsole
echo "status director" | bconsole
# Check logs
tail -50 /var/log/bacula/bacula.log
VTL Slots Not Detected
# Check VTL services
systemctl status mhvtl.target
# Check devices
lsscsi | grep -E "mediumx|tape"
# Test manually
mtx -f /dev/sg7 status
echo "update slots storage=Scalar-i500" | bconsole
Storage Protocols Port Not Listening
# Check service status
systemctl status nfs-server smbd nmbd scst iscsi-scstd
# Check listening ports
ss -tuln | grep -E "2049|445|139|3260"
# Restart services if needed
systemctl restart nfs-server
systemctl restart smbd nmbd
Customization
Modify Thresholds
Edit /usr/local/bin/calypso-healthcheck:
# Disk usage threshold (default: 80%)
check_disk "/" 80 "Root filesystem"
# Memory usage threshold (default: 90%)
if [ "$mem_percent" -lt 90 ]; then
# Change expected VTL devices
if [ "$changer_count" -ge 2 ] && [ "$tape_count" -ge 8 ]; then
Add Custom Checks
Add new check functions:
check_custom() {
TOTAL_CHECKS=$((TOTAL_CHECKS + 1))
if [[ condition ]]; then
echo -e "${GREEN}${CHECK}${NC} Custom check passed" | tee -a "$LOG_FILE"
PASSED_CHECKS=$((PASSED_CHECKS + 1))
else
echo -e "${RED}${CROSS}${NC} Custom check failed" | tee -a "$LOG_FILE"
FAILED_CHECKS=$((FAILED_CHECKS + 1))
fi
}
# Call in main script
check_custom
Integration
Monitoring Systems
Export metrics for monitoring:
# Nagios/Icinga format
calypso-healthcheck
if [ $? -eq 0 ]; then
echo "OK - All checks passed"
exit 0
elif [ $? -eq 1 ]; then
echo "WARNING - Healthy with warnings"
exit 1
else
echo "CRITICAL - System degraded"
exit 2
fi
API Integration
Parse JSON output:
# Add JSON output option
calypso-healthcheck --json > /tmp/health.json
Maintenance
Log Rotation
Logs are stored in /var/log/calypso-healthcheck-*.log
Create /etc/logrotate.d/calypso-healthcheck:
/var/log/calypso-healthcheck-*.log {
weekly
rotate 12
compress
delaycompress
missingok
notifempty
}
Cleanup Old Logs
# Remove logs older than 30 days
find /var/log -name "calypso-healthcheck-*.log" -mtime +30 -delete
Best Practices
- Run after reboot - Verify all services started correctly
- Schedule regular checks - Daily or weekly automated runs
- Monitor exit codes - Alert on degraded/critical status
- Review logs periodically - Identify patterns or recurring issues
- Update checks - Add new components as system evolves
- Baseline health - Establish normal operating parameters
- Document exceptions - Note known warnings that are acceptable
See Also
pre-reboot-checklist.md- Pre-reboot verificationbacula-vtl-troubleshooting.md- VTL troubleshooting guide- System logs:
/var/log/syslog,/var/log/bacula/
Created: 2025-12-31
Script: /usr/local/bin/calypso-healthcheck