Files
calypso/docs/alpha/srs/SRS-09-Monitoring-Alerting.md
2026-01-04 13:19:40 +07:00

4.1 KiB

SRS-09: Monitoring & Alerting

1. Overview

Monitoring & Alerting module provides real-time system monitoring, metrics collection, alert management, and system health tracking.

2. Functional Requirements

2.1 System Metrics

FR-MON-001: System shall collect and display CPU metrics

  • Output: CPU usage percentage, load average
  • Refresh: Every 5 seconds

FR-MON-002: System shall collect and display memory metrics

  • Output: Total memory, used memory, available memory, usage percentage
  • Refresh: Every 5 seconds

FR-MON-003: System shall collect and display storage metrics

  • Output: Total capacity, used capacity, available capacity, usage percentage
  • Refresh: Every 5 seconds

FR-MON-004: System shall collect and display network throughput

  • Output: Inbound/outbound throughput, historical data
  • Refresh: Every 5 seconds

FR-MON-005: System shall display ZFS ARC statistics

  • Output: ARC hit ratio, cache size, eviction statistics
  • Refresh: Real-time

2.2 ZFS Health Monitoring

FR-MON-006: System shall display ZFS pool health

  • Output: Pool status, health indicators, errors

FR-MON-007: System shall display ZFS dataset health

  • Output: Dataset status, quota usage, compression ratio

2.3 System Logs

FR-MON-008: System shall display system logs

  • Output: Log entries with timestamp, level, source, message
  • Filtering: By level, time range, search
  • Refresh: Every 10 minutes

FR-MON-009: System shall allow users to search logs

  • Input: Search query
  • Output: Filtered log entries

2.4 Active Jobs

FR-MON-010: System shall display active jobs

  • Output: Job list with type, status, progress, start time

FR-MON-011: System shall allow users to view job details

  • Output: Job configuration, progress, logs

2.5 Alert Management

FR-MON-012: System shall display active alerts

  • Output: Alert list with severity, source, message, timestamp

FR-MON-013: System shall allow users to acknowledge alerts

  • Input: Alert ID
  • Action: Mark alert as acknowledged

FR-MON-014: System shall allow users to resolve alerts

  • Input: Alert ID
  • Action: Mark alert as resolved

FR-MON-015: System shall display alert history

  • Output: Historical alerts with status, resolution

FR-MON-016: System shall allow users to configure alert rules

  • Input: Rule name, condition, severity, enabled flag
  • Output: Created alert rule

FR-MON-017: System shall evaluate alert rules

  • Action: Automatic evaluation based on metrics
  • Output: Generated alerts when conditions met

2.6 Health Checks

FR-MON-018: System shall perform health checks

  • Output: Overall system health status (healthy/degraded/unhealthy)

FR-MON-019: System shall display health check details

  • Output: Component health status, issues, recommendations

3. User Interface Requirements

3.1 Monitoring Dashboard

  • Metrics cards (CPU, Memory, Storage, Network)
  • Real-time charts (Network Throughput, ZFS ARC Hit Ratio)
  • System health indicators

3.2 Tabs

  • Active Jobs: Running jobs list
  • System Logs: Log viewer with filtering
  • Alerts History: Alert list and management

3.3 Alert Management

  • Alert list with severity indicators
  • Alert detail view
  • Alert acknowledgment and resolution

4. API Endpoints

GET    /api/v1/monitoring/metrics
GET    /api/v1/monitoring/health
GET    /api/v1/monitoring/alerts
GET    /api/v1/monitoring/alerts/:id
POST   /api/v1/monitoring/alerts/:id/acknowledge
POST   /api/v1/monitoring/alerts/:id/resolve
GET    /api/v1/monitoring/rules
POST   /api/v1/monitoring/rules
PUT    /api/v1/monitoring/rules/:id
DELETE /api/v1/monitoring/rules/:id

GET    /api/v1/system/logs
GET    /api/v1/system/network/throughput

5. Permissions

  • monitoring:read: Required for viewing metrics, alerts, logs
  • monitoring:write: Required for acknowledging/resolving alerts, configuring rules

6. Error Handling

  • Metrics collection failures
  • Alert rule evaluation errors
  • Log access errors
  • Insufficient permissions