logging and diagnostic features added
Some checks failed
CI / test-build (push) Failing after 2m11s

This commit is contained in:
2025-12-15 00:45:14 +07:00
parent 3e64de18ed
commit df475bc85e
26 changed files with 5878 additions and 91 deletions

278
docs/API_SECURITY.md Normal file
View File

@@ -0,0 +1,278 @@
# API Security & Rate Limiting
## Overview
AtlasOS implements comprehensive API security measures including rate limiting, security headers, CORS protection, and request validation to protect the API from abuse and attacks.
## Rate Limiting
### Token Bucket Algorithm
The rate limiter uses a token bucket algorithm:
- **Default Rate**: 100 requests per minute per client
- **Window**: 60 seconds
- **Token Refill**: Tokens are refilled based on elapsed time
- **Per-Client**: Rate limiting is applied per IP address or user ID
### Rate Limit Headers
All responses include rate limit headers:
```
X-RateLimit-Limit: 100
X-RateLimit-Window: 60
```
### Rate Limit Exceeded
When rate limit is exceeded, the API returns:
```json
{
"code": "SERVICE_UNAVAILABLE",
"message": "rate limit exceeded",
"details": "too many requests, please try again later"
}
```
**HTTP Status**: `429 Too Many Requests`
### Client Identification
Rate limiting uses different keys based on authentication:
- **Authenticated Users**: `user:{user_id}` - More granular per-user limiting
- **Unauthenticated**: `ip:{ip_address}` - IP-based limiting
### Public Endpoints
Public endpoints (login, health checks) are excluded from rate limiting to ensure availability.
## Security Headers
All responses include security headers:
### X-Content-Type-Options
- **Value**: `nosniff`
- **Purpose**: Prevents MIME type sniffing
### X-Frame-Options
- **Value**: `DENY`
- **Purpose**: Prevents clickjacking attacks
### X-XSS-Protection
- **Value**: `1; mode=block`
- **Purpose**: Enables XSS filtering in browsers
### Referrer-Policy
- **Value**: `strict-origin-when-cross-origin`
- **Purpose**: Controls referrer information
### Permissions-Policy
- **Value**: `geolocation=(), microphone=(), camera=()`
- **Purpose**: Disables unnecessary browser features
### Strict-Transport-Security (HSTS)
- **Value**: `max-age=31536000; includeSubDomains`
- **Purpose**: Forces HTTPS connections (only on HTTPS)
- **Note**: Only added when request is over TLS
### Content-Security-Policy (CSP)
- **Value**: `default-src 'self'; script-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net; style-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net; img-src 'self' data:; font-src 'self' https://cdn.jsdelivr.net; connect-src 'self';`
- **Purpose**: Restricts resource loading to prevent XSS
## CORS (Cross-Origin Resource Sharing)
### Allowed Origins
By default, the following origins are allowed:
- `http://localhost:8080`
- `http://localhost:3000`
- `http://127.0.0.1:8080`
- Same-origin requests (no Origin header)
### CORS Headers
When a request comes from an allowed origin:
```
Access-Control-Allow-Origin: http://localhost:8080
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, PATCH, OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization, X-Requested-With
Access-Control-Allow-Credentials: true
Access-Control-Max-Age: 3600
```
### Preflight Requests
OPTIONS requests are handled automatically:
- **Status**: `204 No Content`
- **Headers**: All CORS headers included
- **Purpose**: Browser preflight checks
## Request Size Limits
### Maximum Request Body Size
- **Limit**: 10 MB (10,485,760 bytes)
- **Enforcement**: Automatic via `http.MaxBytesReader`
- **Error**: Returns `413 Request Entity Too Large` if exceeded
### Content-Type Validation
POST, PUT, and PATCH requests must include a valid `Content-Type` header:
**Allowed Types:**
- `application/json`
- `application/x-www-form-urlencoded`
- `multipart/form-data`
**Error Response:**
```json
{
"code": "BAD_REQUEST",
"message": "Content-Type must be application/json"
}
```
## Middleware Chain Order
Security middleware is applied in the following order (outer to inner):
1. **CORS** - Handles preflight requests
2. **Security Headers** - Adds security headers
3. **Request Size Limit** - Enforces 10MB limit
4. **Content-Type Validation** - Validates request content type
5. **Rate Limiting** - Enforces rate limits
6. **Error Recovery** - Catches panics
7. **Request ID** - Generates request IDs
8. **Logging** - Logs requests
9. **Audit** - Records audit logs
10. **Authentication** - Validates JWT tokens
11. **Routes** - Handles requests
## Public Endpoints
The following endpoints are excluded from certain security checks:
- `/api/v1/auth/login` - Rate limiting, Content-Type validation
- `/api/v1/auth/logout` - Rate limiting, Content-Type validation
- `/healthz` - Rate limiting, Content-Type validation
- `/metrics` - Rate limiting, Content-Type validation
- `/api/docs` - Rate limiting, Content-Type validation
- `/api/openapi.yaml` - Rate limiting, Content-Type validation
## Best Practices
### For API Consumers
1. **Respect Rate Limits**: Implement exponential backoff when rate limited
2. **Use Authentication**: Authenticated users get better rate limits
3. **Include Content-Type**: Always include `Content-Type: application/json`
4. **Handle Errors**: Check for `429` status and retry after delay
5. **Request Size**: Keep request bodies under 10MB
### For Administrators
1. **Monitor Rate Limits**: Check logs for rate limit violations
2. **Adjust Limits**: Modify rate limit values in code if needed
3. **CORS Configuration**: Update allowed origins for production
4. **HTTPS**: Always use HTTPS in production for HSTS
5. **Security Headers**: Review CSP policy for your use case
## Configuration
### Rate Limiting
Rate limits are currently hardcoded but can be configured:
```go
// In rate_limit.go
rateLimiter := NewRateLimiter(100, time.Minute) // 100 req/min
```
### CORS Origins
Update allowed origins in `security_middleware.go`:
```go
allowedOrigins := []string{
"https://yourdomain.com",
"https://app.yourdomain.com",
}
```
### Request Size Limit
Modify in `app.go`:
```go
a.requestSizeMiddleware(10*1024*1024) // 10MB
```
## Error Responses
### Rate Limit Exceeded
```json
{
"code": "SERVICE_UNAVAILABLE",
"message": "rate limit exceeded",
"details": "too many requests, please try again later"
}
```
**Status**: `429 Too Many Requests`
### Request Too Large
```json
{
"code": "BAD_REQUEST",
"message": "request body too large"
}
```
**Status**: `413 Request Entity Too Large`
### Invalid Content-Type
```json
{
"code": "BAD_REQUEST",
"message": "Content-Type must be application/json"
}
```
**Status**: `400 Bad Request`
## Monitoring
### Rate Limit Metrics
Monitor rate limit violations:
- Check audit logs for rate limit events
- Monitor `429` status codes in access logs
- Track rate limit headers in responses
### Security Events
Monitor for security-related events:
- Invalid Content-Type headers
- Request size violations
- CORS violations (check server logs)
- Authentication failures
## Future Enhancements
1. **Configurable Rate Limits**: Environment variable configuration
2. **Per-Endpoint Limits**: Different limits for different endpoints
3. **IP Whitelisting**: Bypass rate limits for trusted IPs
4. **Rate Limit Metrics**: Prometheus metrics for rate limiting
5. **Distributed Rate Limiting**: Redis-based for multi-instance deployments
6. **Advanced CORS**: Configurable CORS via environment variables
7. **Request Timeout**: Configurable request timeout limits

307
docs/BACKUP_RESTORE.md Normal file
View File

@@ -0,0 +1,307 @@
# Configuration Backup & Restore
## Overview
AtlasOS provides comprehensive configuration backup and restore functionality, allowing you to save and restore all system configurations including users, storage services (SMB/NFS/iSCSI), and snapshot policies.
## Features
- **Full Configuration Backup**: Backs up all system configurations
- **Compressed Archives**: Backups are stored as gzipped tar archives
- **Metadata Tracking**: Each backup includes metadata (ID, timestamp, description, size)
- **Verification**: Verify backup integrity before restore
- **Dry Run**: Test restore operations without making changes
- **Selective Restore**: Restore specific components or full system
## Configuration
Set the backup directory using the `ATLAS_BACKUP_DIR` environment variable:
```bash
export ATLAS_BACKUP_DIR=/var/lib/atlas/backups
./atlas-api
```
If not set, defaults to `data/backups` in the current directory.
## Backup Contents
A backup includes:
- **Users**: All user accounts (passwords cannot be restored - users must reset)
- **SMB Shares**: All SMB/CIFS share configurations
- **NFS Exports**: All NFS export configurations
- **iSCSI Targets**: All iSCSI targets and LUN mappings
- **Snapshot Policies**: All automated snapshot policies
- **System Config**: Database path and other system settings
## API Endpoints
### Create Backup
**POST** `/api/v1/backups`
Creates a new backup of all system configurations.
**Request Body:**
```json
{
"description": "Backup before major changes"
}
```
**Response:**
```json
{
"id": "backup-1703123456",
"created_at": "2024-12-20T10:30:56Z",
"version": "1.0",
"description": "Backup before major changes",
"size": 24576
}
```
**Example:**
```bash
curl -X POST http://localhost:8080/api/v1/backups \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"description": "Weekly backup"}'
```
### List Backups
**GET** `/api/v1/backups`
Lists all available backups.
**Response:**
```json
[
{
"id": "backup-1703123456",
"created_at": "2024-12-20T10:30:56Z",
"version": "1.0",
"description": "Weekly backup",
"size": 24576
},
{
"id": "backup-1703037056",
"created_at": "2024-12-19T10:30:56Z",
"version": "1.0",
"description": "",
"size": 18432
}
]
```
**Example:**
```bash
curl -X GET http://localhost:8080/api/v1/backups \
-H "Authorization: Bearer <token>"
```
### Get Backup Details
**GET** `/api/v1/backups/{id}`
Retrieves metadata for a specific backup.
**Response:**
```json
{
"id": "backup-1703123456",
"created_at": "2024-12-20T10:30:56Z",
"version": "1.0",
"description": "Weekly backup",
"size": 24576
}
```
**Example:**
```bash
curl -X GET http://localhost:8080/api/v1/backups/backup-1703123456 \
-H "Authorization: Bearer <token>"
```
### Verify Backup
**GET** `/api/v1/backups/{id}?verify=true`
Verifies that a backup file is valid and can be restored.
**Response:**
```json
{
"message": "backup is valid",
"backup_id": "backup-1703123456",
"metadata": {
"id": "backup-1703123456",
"created_at": "2024-12-20T10:30:56Z",
"version": "1.0",
"description": "Weekly backup",
"size": 24576
}
}
```
**Example:**
```bash
curl -X GET "http://localhost:8080/api/v1/backups/backup-1703123456?verify=true" \
-H "Authorization: Bearer <token>"
```
### Restore Backup
**POST** `/api/v1/backups/{id}/restore`
Restores configuration from a backup.
**Request Body:**
```json
{
"dry_run": false
}
```
**Parameters:**
- `dry_run` (optional): If `true`, shows what would be restored without making changes
**Response:**
```json
{
"message": "backup restored successfully",
"backup_id": "backup-1703123456"
}
```
**Example:**
```bash
# Dry run (test restore)
curl -X POST http://localhost:8080/api/v1/backups/backup-1703123456/restore \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"dry_run": true}'
# Actual restore
curl -X POST http://localhost:8080/api/v1/backups/backup-1703123456/restore \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"dry_run": false}'
```
### Delete Backup
**DELETE** `/api/v1/backups/{id}`
Deletes a backup file and its metadata.
**Response:**
```json
{
"message": "backup deleted",
"backup_id": "backup-1703123456"
}
```
**Example:**
```bash
curl -X DELETE http://localhost:8080/api/v1/backups/backup-1703123456 \
-H "Authorization: Bearer <token>"
```
## Restore Process
When restoring a backup:
1. **Verification**: Backup is verified before restore
2. **User Restoration**:
- Users are restored with temporary passwords
- Default admin user (user-1) is skipped
- Users must reset their passwords after restore
3. **Storage Services**:
- SMB shares, NFS exports, and iSCSI targets are restored
- Existing configurations are skipped (not overwritten)
- Service configurations are automatically applied
4. **Snapshot Policies**:
- Policies are restored by dataset
- Existing policies are skipped
5. **Service Application**:
- Samba, NFS, and iSCSI services are reconfigured
- Errors are logged but don't fail the restore
## Backup File Format
Backups are stored as gzipped tar archives containing:
- `metadata.json`: Backup metadata (ID, timestamp, description, etc.)
- `config.json`: All configuration data (users, shares, exports, targets, policies)
## Best Practices
1. **Regular Backups**: Create backups before major configuration changes
2. **Verify Before Restore**: Always verify backups before restoring
3. **Test Restores**: Use dry run to test restore operations
4. **Backup Retention**: Keep multiple backups for different time periods
5. **Offsite Storage**: Copy backups to external storage for disaster recovery
6. **Password Management**: Users must reset passwords after restore
## Limitations
- **Passwords**: User passwords cannot be restored (security feature)
- **ZFS Data**: Backups only include configuration, not ZFS pool/dataset data
- **Audit Logs**: Audit logs are not included in backups
- **Jobs**: Background jobs are not included in backups
## Error Handling
- **Invalid Backup**: Verification fails if backup is corrupted
- **Missing Resources**: Restore skips resources that already exist
- **Service Errors**: Service configuration errors are logged but don't fail restore
- **Partial Restore**: Restore continues even if some components fail
## Security Considerations
1. **Backup Storage**: Store backups in secure locations
2. **Access Control**: Backup endpoints require authentication
3. **Password Security**: Passwords are never included in backups
4. **Encryption**: Consider encrypting backups for sensitive environments
## Example Workflow
```bash
# 1. Create backup before changes
BACKUP_ID=$(curl -X POST http://localhost:8080/api/v1/backups \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"description": "Before major changes"}' \
| jq -r '.id')
# 2. Verify backup
curl -X GET "http://localhost:8080/api/v1/backups/$BACKUP_ID?verify=true" \
-H "Authorization: Bearer <token>"
# 3. Make configuration changes
# ... make changes ...
# 4. Test restore (dry run)
curl -X POST "http://localhost:8080/api/v1/backups/$BACKUP_ID/restore" \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"dry_run": true}'
# 5. Restore if needed
curl -X POST "http://localhost:8080/api/v1/backups/$BACKUP_ID/restore" \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"dry_run": false}'
```
## Future Enhancements
- **Scheduled Backups**: Automatic backup scheduling
- **Incremental Backups**: Only backup changes since last backup
- **Backup Encryption**: Encrypt backup files
- **Remote Storage**: Support for S3, FTP, etc.
- **Backup Compression**: Additional compression options
- **Selective Restore**: Restore specific components only

242
docs/ERROR_HANDLING.md Normal file
View File

@@ -0,0 +1,242 @@
# Error Handling & Recovery
## Overview
AtlasOS implements comprehensive error handling with structured error responses, graceful degradation, and automatic recovery mechanisms to ensure system reliability and good user experience.
## Error Types
### Structured API Errors
All API errors follow a consistent structure:
```json
{
"code": "NOT_FOUND",
"message": "dataset not found",
"details": "tank/missing"
}
```
### Error Codes
- `INTERNAL_ERROR` - Unexpected server errors (500)
- `NOT_FOUND` - Resource not found (404)
- `BAD_REQUEST` - Invalid request parameters (400)
- `CONFLICT` - Resource conflict (409)
- `UNAUTHORIZED` - Authentication required (401)
- `FORBIDDEN` - Insufficient permissions (403)
- `SERVICE_UNAVAILABLE` - Service temporarily unavailable (503)
- `VALIDATION_ERROR` - Input validation failed (400)
## Error Handling Patterns
### 1. Structured Error Responses
All errors use the `errors.APIError` type for consistent formatting:
```go
if resource == nil {
writeError(w, errors.ErrNotFound("dataset").WithDetails(datasetName))
return
}
```
### 2. Graceful Degradation
Service operations (SMB/NFS/iSCSI) use graceful degradation:
- **Desired State Stored**: Configuration is always stored in the store
- **Service Application**: Service configuration is applied asynchronously
- **Non-Blocking**: Service failures don't fail API requests
- **Retry Ready**: Failed operations can be retried later
Example:
```go
// Store the configuration (always succeeds)
share, err := a.smbStore.Create(...)
// Apply to service (may fail, but doesn't block)
if err := a.smbService.ApplyConfiguration(shares); err != nil {
// Log but don't fail - desired state is stored
log.Printf("SMB service configuration failed (non-fatal): %v", err)
}
```
### 3. Panic Recovery
All HTTP handlers are wrapped with panic recovery middleware:
```go
func (a *App) errorMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
defer recoverPanic(w, r)
next.ServeHTTP(w, r)
})
}
```
Panics are caught and converted to proper error responses instead of crashing the server.
### 4. Atomic Operations with Rollback
Service configuration operations are atomic with automatic rollback:
1. **Write to temporary file** (`*.atlas.tmp`)
2. **Backup existing config** (`.backup`)
3. **Atomically replace** config file
4. **Reload service**
5. **On failure**: Automatically restore backup
Example (SMB):
```go
// Write to temp file
os.WriteFile(tmpPath, config, 0644)
// Backup existing
cp config.conf config.conf.backup
// Atomic replace
os.Rename(tmpPath, configPath)
// Reload service
if err := reloadService(); err != nil {
// Restore backup automatically
os.Rename(backupPath, configPath)
return err
}
```
## Retry Mechanisms
### Retry Configuration
The `errors.Retry` function provides configurable retry logic:
```go
config := errors.DefaultRetryConfig() // 3 attempts with exponential backoff
err := errors.Retry(func() error {
return serviceOperation()
}, config)
```
### Default Retry Behavior
- **Max Attempts**: 3
- **Backoff**: Exponential (100ms, 200ms, 400ms)
- **Use Case**: Transient failures (network, temporary service unavailability)
## Error Recovery
### Service Configuration Recovery
When service configuration fails:
1. **Configuration is stored** (desired state preserved)
2. **Error is logged** (for debugging)
3. **Operation continues** (API request succeeds)
4. **Manual retry available** (via API or automatic retry later)
### Database Recovery
- **Connection failures**: Logged and retried
- **Transaction failures**: Rolled back automatically
- **Schema errors**: Detected during migration
### ZFS Operation Recovery
- **Command failures**: Returned as errors to caller
- **Partial failures**: State is preserved, operation can be retried
- **Validation**: Performed before destructive operations
## Error Logging
All errors are logged with context:
```go
log.Printf("create SMB share error: %v", err)
log.Printf("%s service error: %v", serviceName, err)
```
Error logs include:
- Error message
- Operation context
- Resource identifiers
- Timestamp (via standard log)
## Best Practices
### 1. Always Use Structured Errors
```go
// Good
writeError(w, errors.ErrNotFound("pool").WithDetails(poolName))
// Avoid
writeJSON(w, http.StatusNotFound, map[string]string{"error": "not found"})
```
### 2. Handle Service Errors Gracefully
```go
// Good - graceful degradation
if err := service.Apply(); err != nil {
log.Printf("service error (non-fatal): %v", err)
// Continue - desired state is stored
}
// Avoid - failing the request
if err := service.Apply(); err != nil {
return err // Don't fail the whole request
}
```
### 3. Validate Before Operations
```go
// Good - validate first
if !datasetExists {
writeError(w, errors.ErrNotFound("dataset"))
return
}
// Then perform operation
```
### 4. Use Context for Error Details
```go
// Good - include context
writeError(w, errors.ErrInternal("failed to create pool").WithDetails(err.Error()))
// Avoid - generic errors
writeError(w, errors.ErrInternal("error"))
```
## Error Response Format
All error responses follow this structure:
```json
{
"code": "ERROR_CODE",
"message": "Human-readable error message",
"details": "Additional context (optional)"
}
```
HTTP status codes match error types:
- `400` - Bad Request / Validation Error
- `401` - Unauthorized
- `403` - Forbidden
- `404` - Not Found
- `409` - Conflict
- `500` - Internal Error
- `503` - Service Unavailable
## Future Enhancements
1. **Error Tracking**: Centralized error tracking and alerting
2. **Automatic Retry Queue**: Background retry for failed operations
3. **Error Metrics**: Track error rates by type and endpoint
4. **User-Friendly Messages**: More descriptive error messages
5. **Error Correlation**: Link related errors for debugging

366
docs/LOGGING_DIAGNOSTICS.md Normal file
View File

@@ -0,0 +1,366 @@
# Logging & Diagnostics
## Overview
AtlasOS provides comprehensive logging and diagnostic capabilities to help monitor system health, troubleshoot issues, and understand system behavior.
## Structured Logging
### Logger Package
The `internal/logger` package provides structured logging with:
- **Log Levels**: DEBUG, INFO, WARN, ERROR
- **JSON Mode**: Optional JSON-formatted output
- **Structured Fields**: Key-value pairs for context
- **Thread-Safe**: Safe for concurrent use
### Configuration
Configure logging via environment variables:
```bash
# Log level (DEBUG, INFO, WARN, ERROR)
export ATLAS_LOG_LEVEL=INFO
# Log format (json or text)
export ATLAS_LOG_FORMAT=json
```
### Usage
```go
import "gitea.avt.data-center.id/othman.suseno/atlas/internal/logger"
// Simple logging
logger.Info("User logged in")
logger.Error("Failed to create pool", err)
// With fields
logger.Info("Pool created", map[string]interface{}{
"pool": "tank",
"size": "10TB",
})
```
### Log Levels
- **DEBUG**: Detailed information for debugging
- **INFO**: General informational messages
- **WARN**: Warning messages for potential issues
- **ERROR**: Error messages for failures
## Request Logging
### Access Logs
All HTTP requests are logged with:
- **Timestamp**: Request time
- **Method**: HTTP method (GET, POST, etc.)
- **Path**: Request path
- **Status**: HTTP status code
- **Duration**: Request processing time
- **Request ID**: Unique request identifier
- **Remote Address**: Client IP address
**Example Log Entry:**
```
2024-12-20T10:30:56Z [INFO] 192.168.1.100 GET /api/v1/pools status=200 rid=abc123 dur=45ms
```
### Request ID
Every request gets a unique request ID:
- **Header**: `X-Request-Id`
- **Usage**: Track requests across services
- **Format**: 32-character hex string
## Diagnostic Endpoints
### System Information
**GET** `/api/v1/system/info`
Returns comprehensive system information:
```json
{
"version": "v0.1.0-dev",
"uptime": "3600 seconds",
"go_version": "go1.21.0",
"num_goroutines": 15,
"memory": {
"alloc": 1048576,
"total_alloc": 52428800,
"sys": 2097152,
"num_gc": 5
},
"services": {
"smb": {
"status": "running",
"last_check": "2024-12-20T10:30:56Z"
},
"nfs": {
"status": "running",
"last_check": "2024-12-20T10:30:56Z"
},
"iscsi": {
"status": "stopped",
"last_check": "2024-12-20T10:30:56Z"
}
},
"database": {
"connected": true,
"path": "/var/lib/atlas/atlas.db"
}
}
```
### Health Check
**GET** `/health`
Detailed health check with component status:
```json
{
"status": "healthy",
"timestamp": "2024-12-20T10:30:56Z",
"checks": {
"zfs": "healthy",
"database": "healthy",
"smb": "healthy",
"nfs": "healthy",
"iscsi": "stopped"
}
}
```
**Status Values:**
- `healthy`: Component is working correctly
- `degraded`: Some components have issues but system is operational
- `unhealthy`: Critical components are failing
**HTTP Status Codes:**
- `200 OK`: System is healthy or degraded
- `503 Service Unavailable`: System is unhealthy
### System Logs
**GET** `/api/v1/system/logs?limit=100`
Returns recent system logs (from audit logs):
```json
{
"logs": [
{
"timestamp": "2024-12-20T10:30:56Z",
"level": "INFO",
"actor": "user-1",
"action": "pool.create",
"resource": "pool:tank",
"result": "success",
"ip": "192.168.1.100"
}
],
"count": 1
}
```
**Query Parameters:**
- `limit`: Maximum number of logs to return (default: 100, max: 1000)
### Garbage Collection
**POST** `/api/v1/system/gc`
Triggers garbage collection and returns memory statistics:
```json
{
"before": {
"alloc": 1048576,
"total_alloc": 52428800,
"sys": 2097152,
"num_gc": 5
},
"after": {
"alloc": 512000,
"total_alloc": 52428800,
"sys": 2097152,
"num_gc": 6
},
"freed": 536576
}
```
## Audit Logging
Audit logs track all mutating operations:
- **Actor**: User ID or "system"
- **Action**: Operation type (e.g., "pool.create")
- **Resource**: Resource identifier
- **Result**: "success" or "failure"
- **IP**: Client IP address
- **User Agent**: Client user agent
- **Timestamp**: Operation time
See [Audit Logging Documentation](./AUDIT_LOGGING.md) for details.
## Log Rotation
### Current Implementation
- **In-Memory**: Audit logs stored in memory
- **Rotation**: Automatic rotation when max logs reached
- **Limit**: Configurable (default: 10,000 logs)
### Future Enhancements
- **File Logging**: Write logs to files
- **Automatic Rotation**: Rotate log files by size/age
- **Compression**: Compress old log files
- **Retention**: Configurable retention policies
## Best Practices
### 1. Use Appropriate Log Levels
```go
// Debug - detailed information
logger.Debug("Processing request", map[string]interface{}{
"request_id": reqID,
"user": userID,
})
// Info - important events
logger.Info("User logged in", map[string]interface{}{
"user": userID,
})
// Warn - potential issues
logger.Warn("High memory usage", map[string]interface{}{
"usage": "85%",
})
// Error - failures
logger.Error("Failed to create pool", err, map[string]interface{}{
"pool": poolName,
})
```
### 2. Include Context
Always include relevant context in logs:
```go
// Good
logger.Info("Pool created", map[string]interface{}{
"pool": poolName,
"size": poolSize,
"user": userID,
})
// Avoid
logger.Info("Pool created")
```
### 3. Use Request IDs
Include request IDs in logs for tracing:
```go
reqID := r.Context().Value(requestIDKey).(string)
logger.Info("Processing request", map[string]interface{}{
"request_id": reqID,
})
```
### 4. Monitor Health Endpoints
Regularly check health endpoints:
```bash
# Simple health check
curl http://localhost:8080/healthz
# Detailed health check
curl http://localhost:8080/health
# System information
curl http://localhost:8080/api/v1/system/info
```
## Monitoring
### Key Metrics
Monitor these metrics for system health:
- **Request Duration**: Track in access logs
- **Error Rate**: Count of error responses
- **Memory Usage**: Check via `/api/v1/system/info`
- **Goroutine Count**: Monitor for leaks
- **Service Status**: Check service health
### Alerting
Set up alerts for:
- **Unhealthy Status**: System health check fails
- **High Error Rate**: Too many error responses
- **Memory Leaks**: Continuously increasing memory
- **Service Failures**: Services not running
## Troubleshooting
### Check System Health
```bash
curl http://localhost:8080/health
```
### View System Information
```bash
curl http://localhost:8080/api/v1/system/info
```
### Check Recent Logs
```bash
curl http://localhost:8080/api/v1/system/logs?limit=50
```
### Trigger GC
```bash
curl -X POST http://localhost:8080/api/v1/system/gc
```
### View Request Logs
Check application logs for request details:
```bash
# If logging to stdout
./atlas-api | grep "GET /api/v1/pools"
# If logging to file
tail -f /var/log/atlas-api.log | grep "status=500"
```
## Future Enhancements
1. **File Logging**: Write logs to files with rotation
2. **Log Aggregation**: Support for centralized logging (ELK, Loki)
3. **Structured Logging**: Full JSON logging support
4. **Log Levels per Component**: Different levels for different components
5. **Performance Logging**: Detailed performance metrics
6. **Distributed Tracing**: Request tracing across services
7. **Log Filtering**: Filter logs by level, component, etc.
8. **Real-time Log Streaming**: Stream logs via WebSocket

232
docs/VALIDATION.md Normal file
View File

@@ -0,0 +1,232 @@
# Input Validation & Sanitization
## Overview
AtlasOS implements comprehensive input validation and sanitization to ensure data integrity, security, and prevent injection attacks. All user inputs are validated before processing.
## Validation Rules
### ZFS Names (Pools, Datasets, ZVOLs, Snapshots)
**Rules:**
- Must start with alphanumeric character
- Can contain: `a-z`, `A-Z`, `0-9`, `_`, `-`, `.`, `:`
- Cannot start with `-` or `.`
- Maximum length: 256 characters
- Cannot be empty
**Example:**
```go
if err := validation.ValidateZFSName("tank/data"); err != nil {
// Handle error
}
```
### Usernames
**Rules:**
- Minimum length: 3 characters
- Maximum length: 32 characters
- Can contain: `a-z`, `A-Z`, `0-9`, `_`, `-`, `.`
- Must start with alphanumeric character
**Example:**
```go
if err := validation.ValidateUsername("admin"); err != nil {
// Handle error
}
```
### Passwords
**Rules:**
- Minimum length: 8 characters
- Maximum length: 128 characters
- Must contain at least one letter
- Must contain at least one number
**Example:**
```go
if err := validation.ValidatePassword("SecurePass123"); err != nil {
// Handle error
}
```
### Email Addresses
**Rules:**
- Optional field (can be empty)
- Maximum length: 254 characters
- Must match email format pattern
- Basic format validation (RFC 5322 simplified)
**Example:**
```go
if err := validation.ValidateEmail("user@example.com"); err != nil {
// Handle error
}
```
### SMB Share Names
**Rules:**
- Maximum length: 80 characters
- Can contain: `a-z`, `A-Z`, `0-9`, `_`, `-`, `.`
- Cannot be reserved Windows names (CON, PRN, AUX, NUL, COM1-9, LPT1-9)
- Must start with alphanumeric character
**Example:**
```go
if err := validation.ValidateShareName("data-share"); err != nil {
// Handle error
}
```
### iSCSI IQN (Qualified Name)
**Rules:**
- Must start with `iqn.`
- Format: `iqn.yyyy-mm.reversed.domain:identifier`
- Maximum length: 223 characters
- Year-month format validation
**Example:**
```go
if err := validation.ValidateIQN("iqn.2024-12.com.atlas:storage.target1"); err != nil {
// Handle error
}
```
### Size Strings
**Rules:**
- Format: number followed by optional unit (K, M, G, T, P)
- Units: K (kilobytes), M (megabytes), G (gigabytes), T (terabytes), P (petabytes)
- Case insensitive
**Examples:**
- `"10"` - 10 bytes
- `"10K"` - 10 kilobytes
- `"1G"` - 1 gigabyte
- `"2T"` - 2 terabytes
**Example:**
```go
if err := validation.ValidateSize("10G"); err != nil {
// Handle error
}
```
### Filesystem Paths
**Rules:**
- Must be absolute (start with `/`)
- Maximum length: 4096 characters
- Cannot contain `..` (path traversal)
- Cannot contain `//` (double slashes)
- Cannot contain null bytes
**Example:**
```go
if err := validation.ValidatePath("/tank/data"); err != nil {
// Handle error
}
```
### CIDR/Hostname (NFS Clients)
**Rules:**
- Can be wildcard: `*`
- Can be CIDR notation: `192.168.1.0/24`
- Can be hostname: `server.example.com`
- Hostname must follow DNS rules
**Example:**
```go
if err := validation.ValidateCIDR("192.168.1.0/24"); err != nil {
// Handle error
}
```
## Sanitization
### String Sanitization
Removes potentially dangerous characters:
- Null bytes (`\x00`)
- Control characters (ASCII < 32, except space)
- Removes leading/trailing whitespace
**Example:**
```go
clean := validation.SanitizeString(userInput)
```
### Path Sanitization
Normalizes filesystem paths:
- Removes leading/trailing whitespace
- Normalizes slashes (backslash to forward slash)
- Removes multiple consecutive slashes
**Example:**
```go
cleanPath := validation.SanitizePath("/tank//data/")
// Result: "/tank/data"
```
## Integration
### In API Handlers
Validation is integrated into all create/update handlers:
```go
func (a *App) handleCreatePool(w http.ResponseWriter, r *http.Request) {
// ... decode request ...
// Validate pool name
if err := validation.ValidateZFSName(req.Name); err != nil {
writeError(w, errors.ErrValidation(err.Error()))
return
}
// ... continue with creation ...
}
```
### Error Responses
Validation errors return structured error responses:
```json
{
"code": "VALIDATION_ERROR",
"message": "validation error on field 'name': name cannot be empty",
"details": ""
}
```
## Security Benefits
1. **Injection Prevention**: Validates inputs prevent command injection
2. **Path Traversal Protection**: Path validation prevents directory traversal
3. **Data Integrity**: Ensures data conforms to expected formats
4. **System Stability**: Prevents invalid operations that could crash services
5. **User Experience**: Clear error messages guide users to correct input
## Best Practices
1. **Validate Early**: Validate inputs as soon as they're received
2. **Sanitize Before Storage**: Sanitize strings before storing in database
3. **Validate Format**: Check format before parsing (e.g., size strings)
4. **Check Length**: Enforce maximum lengths to prevent DoS
5. **Whitelist Characters**: Only allow known-safe characters
## Future Enhancements
1. **Custom Validators**: Domain-specific validation rules
2. **Validation Middleware**: Automatic validation for all endpoints
3. **Schema Validation**: JSON schema validation
4. **Rate Limiting**: Prevent abuse through validation
5. **Input Normalization**: Automatic normalization of valid inputs

1866
docs/openapi.yaml Normal file

File diff suppressed because it is too large Load Diff