# Calypso Appliance Health Check Script

## Overview
Comprehensive health check script for all Calypso Appliance components. Performs automated checks across system resources, services, network, storage, and backup infrastructure.

## Installation
Script location: `/usr/local/bin/calypso-healthcheck`

## Usage

### Basic Usage
```bash
# Run health check (requires root)
calypso-healthcheck

# Run and save to specific location
calypso-healthcheck 2>&1 | tee /root/healthcheck-$(date +%Y%m%d).log
```

### Exit Codes
- `0` - All checks passed (100% healthy)
- `1` - Healthy with warnings (some non-critical issues)
- `2` - Degraded (80%+ checks passed, some failures)
- `3` - Critical (less than 80% checks passed)

### Automated Checks

#### System Resources (4 checks)
- Root filesystem usage (threshold: 80%)
- /var filesystem usage (threshold: 80%)
- Memory usage (threshold: 90%)
- CPU load average

#### Database Services (2 checks)
- PostgreSQL service status
- Database presence (calypso, bacula)

#### Calypso Application (7 checks)
- calypso-api service
- calypso-frontend service
- calypso-logger service
- API port 8443
- Frontend port 3000
- API health endpoint
- Frontend health endpoint

#### Backup Services - Bacula (8 checks)
- bacula-director service
- bacula-fd service
- bacula-sd service
- Director bconsole connectivity
- Storage (Scalar-i500) accessibility
- Director port 9101
- FD port 9102
- SD port 9103

#### Virtual Tape Library - mhVTL (4 checks)
- mhvtl.target status
- vtllibrary@10 (Scalar i500)
- vtllibrary@30 (Scalar i40)
- VTL device count (2 changers, 8 tape drives)
- Scalar i500 slots detection

#### Storage Protocols (9 checks)
- NFS server service
- Samba (smbd) service
- NetBIOS (nmbd) service
- SCST service
- iSCSI target service
- NFS port 2049
- SMB port 445
- NetBIOS port 139
- iSCSI port 3260

#### Monitoring & Management (2 checks)
- SNMP daemon
- SNMP port 161

#### Network Connectivity (2 checks)
- Internet connectivity (ping 8.8.8.8)
- Network manager status

**Total: 39+ automated checks**

## Output Format

### Console Output
- Color-coded status indicators:
  - ✓ Green = Passed
  - ⚠ Yellow = Warning
  - ✗ Red = Failed

### Example Output
```
==========================================
  CALYPSO APPLIANCE HEALTH CHECK
==========================================
Date: 2025-12-31 01:46:27
Hostname: calypso
Uptime: up 6 days, 2 hours, 50 minutes
Log file: /var/log/calypso-healthcheck-20251231-014627.log

========================================
SYSTEM RESOURCES
========================================
✓ Root filesystem (18% used)
✓ Var filesystem (18% used)
✓ Memory usage (49% used, 8206MB available)
✓ CPU load average (2.18, 8 cores)

...

========================================
HEALTH CHECK SUMMARY
========================================

Total Checks:    39
Passed:          35
Warnings:        0
Failed:          4

⚠ OVERALL STATUS: DEGRADED (89%)
```

### Log Files
All checks are logged to: `/var/log/calypso-healthcheck-YYYYMMDD-HHMMSS.log`

Logs include:
- Timestamp and system information
- Detailed check results
- Summary statistics
- Overall health status

## Scheduling

### Manual Execution
```bash
# Run on demand
sudo calypso-healthcheck
```

### Cron Job (Recommended)
Add to crontab for automated checks:

```bash
# Daily health check at 2 AM
0 2 * * * /usr/local/bin/calypso-healthcheck > /dev/null 2>&1

# Weekly health check on Monday at 6 AM with email notification
0 6 * * 1 /usr/local/bin/calypso-healthcheck 2>&1 | mail -s "Calypso Health Check" admin@example.com
```

### Systemd Timer (Alternative)
Create `/etc/systemd/system/calypso-healthcheck.timer`:
```ini
[Unit]
Description=Daily Calypso Health Check
Requires=calypso-healthcheck.service

[Timer]
OnCalendar=daily
Persistent=true

[Install]
WantedBy=timers.target
```

Create `/etc/systemd/system/calypso-healthcheck.service`:
```ini
[Unit]
Description=Calypso Appliance Health Check

[Service]
Type=oneshot
ExecStart=/usr/local/bin/calypso-healthcheck
```

Enable:
```bash
systemctl enable --now calypso-healthcheck.timer
```

## Troubleshooting

### Common Failures

#### API/Frontend Health Endpoints Failing
```bash
# Check if services are running
systemctl status calypso-api calypso-frontend

# Check service logs
journalctl -u calypso-api -n 50
journalctl -u calypso-frontend -n 50

# Test manually
curl -k https://localhost:8443/health
curl -k https://localhost:3000/health
```

#### Bacula Director Not Responding
```bash
# Check service
systemctl status bacula-director

# Test bconsole
echo "status director" | bconsole

# Check logs
tail -50 /var/log/bacula/bacula.log
```

#### VTL Slots Not Detected
```bash
# Check VTL services
systemctl status mhvtl.target

# Check devices
lsscsi | grep -E "mediumx|tape"

# Test manually
mtx -f /dev/sg7 status
echo "update slots storage=Scalar-i500" | bconsole
```

#### Storage Protocols Port Not Listening
```bash
# Check service status
systemctl status nfs-server smbd nmbd scst iscsi-scstd

# Check listening ports
ss -tuln | grep -E "2049|445|139|3260"

# Restart services if needed
systemctl restart nfs-server
systemctl restart smbd nmbd
```

## Customization

### Modify Thresholds
Edit `/usr/local/bin/calypso-healthcheck`:

```bash
# Disk usage threshold (default: 80%)
check_disk "/" 80 "Root filesystem"

# Memory usage threshold (default: 90%)
if [ "$mem_percent" -lt 90 ]; then

# Change expected VTL devices
if [ "$changer_count" -ge 2 ] && [ "$tape_count" -ge 8 ]; then
```

### Add Custom Checks
Add new check functions:

```bash
check_custom() {
    TOTAL_CHECKS=$((TOTAL_CHECKS + 1))
    
    if [[ condition ]]; then
        echo -e "${GREEN}${CHECK}${NC} Custom check passed" | tee -a "$LOG_FILE"
        PASSED_CHECKS=$((PASSED_CHECKS + 1))
    else
        echo -e "${RED}${CROSS}${NC} Custom check failed" | tee -a "$LOG_FILE"
        FAILED_CHECKS=$((FAILED_CHECKS + 1))
    fi
}

# Call in main script
check_custom
```

## Integration

### Monitoring Systems
Export metrics for monitoring:

```bash
# Nagios/Icinga format
calypso-healthcheck
if [ $? -eq 0 ]; then
    echo "OK - All checks passed"
    exit 0
elif [ $? -eq 1 ]; then
    echo "WARNING - Healthy with warnings"
    exit 1
else
    echo "CRITICAL - System degraded"
    exit 2
fi
```

### API Integration
Parse JSON output:

```bash
# Add JSON output option
calypso-healthcheck --json > /tmp/health.json
```

## Maintenance

### Log Rotation
Logs are stored in `/var/log/calypso-healthcheck-*.log`

Create `/etc/logrotate.d/calypso-healthcheck`:
```
/var/log/calypso-healthcheck-*.log {
    weekly
    rotate 12
    compress
    delaycompress
    missingok
    notifempty
}
```

### Cleanup Old Logs
```bash
# Remove logs older than 30 days
find /var/log -name "calypso-healthcheck-*.log" -mtime +30 -delete
```

## Best Practices

1. **Run after reboot** - Verify all services started correctly
2. **Schedule regular checks** - Daily or weekly automated runs
3. **Monitor exit codes** - Alert on degraded/critical status
4. **Review logs periodically** - Identify patterns or recurring issues
5. **Update checks** - Add new components as system evolves
6. **Baseline health** - Establish normal operating parameters
7. **Document exceptions** - Note known warnings that are acceptable

## See Also
- `pre-reboot-checklist.md` - Pre-reboot verification
- `bacula-vtl-troubleshooting.md` - VTL troubleshooting guide
- System logs: `/var/log/syslog`, `/var/log/bacula/`

---

*Created: 2025-12-31*  
*Script: `/usr/local/bin/calypso-healthcheck`*