add shares av system

This commit is contained in:
Warp Agent
2026-01-04 14:11:38 +07:00
parent 70d25e13b8
commit 0c8a9efecc
49 changed files with 405 additions and 1 deletions

View File

@@ -0,0 +1,344 @@
# Calypso Appliance Health Check Script
## Overview
Comprehensive health check script for all Calypso Appliance components. Performs automated checks across system resources, services, network, storage, and backup infrastructure.
## Installation
Script location: `/usr/local/bin/calypso-healthcheck`
## Usage
### Basic Usage
```bash
# Run health check (requires root)
calypso-healthcheck
# Run and save to specific location
calypso-healthcheck 2>&1 | tee /root/healthcheck-$(date +%Y%m%d).log
```
### Exit Codes
- `0` - All checks passed (100% healthy)
- `1` - Healthy with warnings (some non-critical issues)
- `2` - Degraded (80%+ checks passed, some failures)
- `3` - Critical (less than 80% checks passed)
### Automated Checks
#### System Resources (4 checks)
- Root filesystem usage (threshold: 80%)
- /var filesystem usage (threshold: 80%)
- Memory usage (threshold: 90%)
- CPU load average
#### Database Services (2 checks)
- PostgreSQL service status
- Database presence (calypso, bacula)
#### Calypso Application (7 checks)
- calypso-api service
- calypso-frontend service
- calypso-logger service
- API port 8443
- Frontend port 3000
- API health endpoint
- Frontend health endpoint
#### Backup Services - Bacula (8 checks)
- bacula-director service
- bacula-fd service
- bacula-sd service
- Director bconsole connectivity
- Storage (Scalar-i500) accessibility
- Director port 9101
- FD port 9102
- SD port 9103
#### Virtual Tape Library - mhVTL (4 checks)
- mhvtl.target status
- vtllibrary@10 (Scalar i500)
- vtllibrary@30 (Scalar i40)
- VTL device count (2 changers, 8 tape drives)
- Scalar i500 slots detection
#### Storage Protocols (9 checks)
- NFS server service
- Samba (smbd) service
- NetBIOS (nmbd) service
- SCST service
- iSCSI target service
- NFS port 2049
- SMB port 445
- NetBIOS port 139
- iSCSI port 3260
#### Monitoring & Management (2 checks)
- SNMP daemon
- SNMP port 161
#### Network Connectivity (2 checks)
- Internet connectivity (ping 8.8.8.8)
- Network manager status
**Total: 39+ automated checks**
## Output Format
### Console Output
- Color-coded status indicators:
- ✓ Green = Passed
- ⚠ Yellow = Warning
- ✗ Red = Failed
### Example Output
```
==========================================
CALYPSO APPLIANCE HEALTH CHECK
==========================================
Date: 2025-12-31 01:46:27
Hostname: calypso
Uptime: up 6 days, 2 hours, 50 minutes
Log file: /var/log/calypso-healthcheck-20251231-014627.log
========================================
SYSTEM RESOURCES
========================================
✓ Root filesystem (18% used)
✓ Var filesystem (18% used)
✓ Memory usage (49% used, 8206MB available)
✓ CPU load average (2.18, 8 cores)
...
========================================
HEALTH CHECK SUMMARY
========================================
Total Checks: 39
Passed: 35
Warnings: 0
Failed: 4
⚠ OVERALL STATUS: DEGRADED (89%)
```
### Log Files
All checks are logged to: `/var/log/calypso-healthcheck-YYYYMMDD-HHMMSS.log`
Logs include:
- Timestamp and system information
- Detailed check results
- Summary statistics
- Overall health status
## Scheduling
### Manual Execution
```bash
# Run on demand
sudo calypso-healthcheck
```
### Cron Job (Recommended)
Add to crontab for automated checks:
```bash
# Daily health check at 2 AM
0 2 * * * /usr/local/bin/calypso-healthcheck > /dev/null 2>&1
# Weekly health check on Monday at 6 AM with email notification
0 6 * * 1 /usr/local/bin/calypso-healthcheck 2>&1 | mail -s "Calypso Health Check" admin@example.com
```
### Systemd Timer (Alternative)
Create `/etc/systemd/system/calypso-healthcheck.timer`:
```ini
[Unit]
Description=Daily Calypso Health Check
Requires=calypso-healthcheck.service
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.target
```
Create `/etc/systemd/system/calypso-healthcheck.service`:
```ini
[Unit]
Description=Calypso Appliance Health Check
[Service]
Type=oneshot
ExecStart=/usr/local/bin/calypso-healthcheck
```
Enable:
```bash
systemctl enable --now calypso-healthcheck.timer
```
## Troubleshooting
### Common Failures
#### API/Frontend Health Endpoints Failing
```bash
# Check if services are running
systemctl status calypso-api calypso-frontend
# Check service logs
journalctl -u calypso-api -n 50
journalctl -u calypso-frontend -n 50
# Test manually
curl -k https://localhost:8443/health
curl -k https://localhost:3000/health
```
#### Bacula Director Not Responding
```bash
# Check service
systemctl status bacula-director
# Test bconsole
echo "status director" | bconsole
# Check logs
tail -50 /var/log/bacula/bacula.log
```
#### VTL Slots Not Detected
```bash
# Check VTL services
systemctl status mhvtl.target
# Check devices
lsscsi | grep -E "mediumx|tape"
# Test manually
mtx -f /dev/sg7 status
echo "update slots storage=Scalar-i500" | bconsole
```
#### Storage Protocols Port Not Listening
```bash
# Check service status
systemctl status nfs-server smbd nmbd scst iscsi-scstd
# Check listening ports
ss -tuln | grep -E "2049|445|139|3260"
# Restart services if needed
systemctl restart nfs-server
systemctl restart smbd nmbd
```
## Customization
### Modify Thresholds
Edit `/usr/local/bin/calypso-healthcheck`:
```bash
# Disk usage threshold (default: 80%)
check_disk "/" 80 "Root filesystem"
# Memory usage threshold (default: 90%)
if [ "$mem_percent" -lt 90 ]; then
# Change expected VTL devices
if [ "$changer_count" -ge 2 ] && [ "$tape_count" -ge 8 ]; then
```
### Add Custom Checks
Add new check functions:
```bash
check_custom() {
TOTAL_CHECKS=$((TOTAL_CHECKS + 1))
if [[ condition ]]; then
echo -e "${GREEN}${CHECK}${NC} Custom check passed" | tee -a "$LOG_FILE"
PASSED_CHECKS=$((PASSED_CHECKS + 1))
else
echo -e "${RED}${CROSS}${NC} Custom check failed" | tee -a "$LOG_FILE"
FAILED_CHECKS=$((FAILED_CHECKS + 1))
fi
}
# Call in main script
check_custom
```
## Integration
### Monitoring Systems
Export metrics for monitoring:
```bash
# Nagios/Icinga format
calypso-healthcheck
if [ $? -eq 0 ]; then
echo "OK - All checks passed"
exit 0
elif [ $? -eq 1 ]; then
echo "WARNING - Healthy with warnings"
exit 1
else
echo "CRITICAL - System degraded"
exit 2
fi
```
### API Integration
Parse JSON output:
```bash
# Add JSON output option
calypso-healthcheck --json > /tmp/health.json
```
## Maintenance
### Log Rotation
Logs are stored in `/var/log/calypso-healthcheck-*.log`
Create `/etc/logrotate.d/calypso-healthcheck`:
```
/var/log/calypso-healthcheck-*.log {
weekly
rotate 12
compress
delaycompress
missingok
notifempty
}
```
### Cleanup Old Logs
```bash
# Remove logs older than 30 days
find /var/log -name "calypso-healthcheck-*.log" -mtime +30 -delete
```
## Best Practices
1. **Run after reboot** - Verify all services started correctly
2. **Schedule regular checks** - Daily or weekly automated runs
3. **Monitor exit codes** - Alert on degraded/critical status
4. **Review logs periodically** - Identify patterns or recurring issues
5. **Update checks** - Add new components as system evolves
6. **Baseline health** - Establish normal operating parameters
7. **Document exceptions** - Note known warnings that are acceptable
## See Also
- `pre-reboot-checklist.md` - Pre-reboot verification
- `bacula-vtl-troubleshooting.md` - VTL troubleshooting guide
- System logs: `/var/log/syslog`, `/var/log/bacula/`
---
*Created: 2025-12-31*
*Script: `/usr/local/bin/calypso-healthcheck`*