7.4 KiB
Logging & Diagnostics
Overview
AtlasOS provides comprehensive logging and diagnostic capabilities to help monitor system health, troubleshoot issues, and understand system behavior.
Structured Logging
Logger Package
The internal/logger package provides structured logging with:
- Log Levels: DEBUG, INFO, WARN, ERROR
- JSON Mode: Optional JSON-formatted output
- Structured Fields: Key-value pairs for context
- Thread-Safe: Safe for concurrent use
Configuration
Configure logging via environment variables:
# Log level (DEBUG, INFO, WARN, ERROR)
export ATLAS_LOG_LEVEL=INFO
# Log format (json or text)
export ATLAS_LOG_FORMAT=json
Usage
import "gitea.avt.data-center.id/othman.suseno/atlas/internal/logger"
// Simple logging
logger.Info("User logged in")
logger.Error("Failed to create pool", err)
// With fields
logger.Info("Pool created", map[string]interface{}{
"pool": "tank",
"size": "10TB",
})
Log Levels
- DEBUG: Detailed information for debugging
- INFO: General informational messages
- WARN: Warning messages for potential issues
- ERROR: Error messages for failures
Request Logging
Access Logs
All HTTP requests are logged with:
- Timestamp: Request time
- Method: HTTP method (GET, POST, etc.)
- Path: Request path
- Status: HTTP status code
- Duration: Request processing time
- Request ID: Unique request identifier
- Remote Address: Client IP address
Example Log Entry:
2024-12-20T10:30:56Z [INFO] 192.168.1.100 GET /api/v1/pools status=200 rid=abc123 dur=45ms
Request ID
Every request gets a unique request ID:
- Header:
X-Request-Id - Usage: Track requests across services
- Format: 32-character hex string
Diagnostic Endpoints
System Information
GET /api/v1/system/info
Returns comprehensive system information:
{
"version": "v0.1.0-dev",
"uptime": "3600 seconds",
"go_version": "go1.21.0",
"num_goroutines": 15,
"memory": {
"alloc": 1048576,
"total_alloc": 52428800,
"sys": 2097152,
"num_gc": 5
},
"services": {
"smb": {
"status": "running",
"last_check": "2024-12-20T10:30:56Z"
},
"nfs": {
"status": "running",
"last_check": "2024-12-20T10:30:56Z"
},
"iscsi": {
"status": "stopped",
"last_check": "2024-12-20T10:30:56Z"
}
},
"database": {
"connected": true,
"path": "/var/lib/atlas/atlas.db"
}
}
Health Check
GET /health
Detailed health check with component status:
{
"status": "healthy",
"timestamp": "2024-12-20T10:30:56Z",
"checks": {
"zfs": "healthy",
"database": "healthy",
"smb": "healthy",
"nfs": "healthy",
"iscsi": "stopped"
}
}
Status Values:
healthy: Component is working correctlydegraded: Some components have issues but system is operationalunhealthy: Critical components are failing
HTTP Status Codes:
200 OK: System is healthy or degraded503 Service Unavailable: System is unhealthy
System Logs
GET /api/v1/system/logs?limit=100
Returns recent system logs (from audit logs):
{
"logs": [
{
"timestamp": "2024-12-20T10:30:56Z",
"level": "INFO",
"actor": "user-1",
"action": "pool.create",
"resource": "pool:tank",
"result": "success",
"ip": "192.168.1.100"
}
],
"count": 1
}
Query Parameters:
limit: Maximum number of logs to return (default: 100, max: 1000)
Garbage Collection
POST /api/v1/system/gc
Triggers garbage collection and returns memory statistics:
{
"before": {
"alloc": 1048576,
"total_alloc": 52428800,
"sys": 2097152,
"num_gc": 5
},
"after": {
"alloc": 512000,
"total_alloc": 52428800,
"sys": 2097152,
"num_gc": 6
},
"freed": 536576
}
Audit Logging
Audit logs track all mutating operations:
- Actor: User ID or "system"
- Action: Operation type (e.g., "pool.create")
- Resource: Resource identifier
- Result: "success" or "failure"
- IP: Client IP address
- User Agent: Client user agent
- Timestamp: Operation time
See Audit Logging Documentation for details.
Log Rotation
Current Implementation
- In-Memory: Audit logs stored in memory
- Rotation: Automatic rotation when max logs reached
- Limit: Configurable (default: 10,000 logs)
Future Enhancements
- File Logging: Write logs to files
- Automatic Rotation: Rotate log files by size/age
- Compression: Compress old log files
- Retention: Configurable retention policies
Best Practices
1. Use Appropriate Log Levels
// Debug - detailed information
logger.Debug("Processing request", map[string]interface{}{
"request_id": reqID,
"user": userID,
})
// Info - important events
logger.Info("User logged in", map[string]interface{}{
"user": userID,
})
// Warn - potential issues
logger.Warn("High memory usage", map[string]interface{}{
"usage": "85%",
})
// Error - failures
logger.Error("Failed to create pool", err, map[string]interface{}{
"pool": poolName,
})
2. Include Context
Always include relevant context in logs:
// Good
logger.Info("Pool created", map[string]interface{}{
"pool": poolName,
"size": poolSize,
"user": userID,
})
// Avoid
logger.Info("Pool created")
3. Use Request IDs
Include request IDs in logs for tracing:
reqID := r.Context().Value(requestIDKey).(string)
logger.Info("Processing request", map[string]interface{}{
"request_id": reqID,
})
4. Monitor Health Endpoints
Regularly check health endpoints:
# Simple health check
curl http://localhost:8080/healthz
# Detailed health check
curl http://localhost:8080/health
# System information
curl http://localhost:8080/api/v1/system/info
Monitoring
Key Metrics
Monitor these metrics for system health:
- Request Duration: Track in access logs
- Error Rate: Count of error responses
- Memory Usage: Check via
/api/v1/system/info - Goroutine Count: Monitor for leaks
- Service Status: Check service health
Alerting
Set up alerts for:
- Unhealthy Status: System health check fails
- High Error Rate: Too many error responses
- Memory Leaks: Continuously increasing memory
- Service Failures: Services not running
Troubleshooting
Check System Health
curl http://localhost:8080/health
View System Information
curl http://localhost:8080/api/v1/system/info
Check Recent Logs
curl http://localhost:8080/api/v1/system/logs?limit=50
Trigger GC
curl -X POST http://localhost:8080/api/v1/system/gc
View Request Logs
Check application logs for request details:
# If logging to stdout
./atlas-api | grep "GET /api/v1/pools"
# If logging to file
tail -f /var/log/atlas-api.log | grep "status=500"
Future Enhancements
- File Logging: Write logs to files with rotation
- Log Aggregation: Support for centralized logging (ELK, Loki)
- Structured Logging: Full JSON logging support
- Log Levels per Component: Different levels for different components
- Performance Logging: Detailed performance metrics
- Distributed Tracing: Request tracing across services
- Log Filtering: Filter logs by level, component, etc.
- Real-time Log Streaming: Stream logs via WebSocket