add maintenance mode
Some checks failed
CI / test-build (push) Failing after 2m12s

This commit is contained in:
2025-12-15 01:11:51 +07:00
parent 507961716e
commit 9779b30a65
7 changed files with 689 additions and 22 deletions

303
docs/MAINTENANCE_MODE.md Normal file
View File

@@ -0,0 +1,303 @@
# Maintenance Mode & Update Management
## Overview
AtlasOS provides a maintenance mode feature that allows administrators to safely disable user operations during system updates or maintenance. When maintenance mode is enabled, all mutating operations (create, update, delete) are blocked except for users explicitly allowed.
## Features
- **Maintenance Mode**: Disable user operations during maintenance
- **Automatic Backup**: Optionally create backup before entering maintenance
- **Allowed Users**: Specify users who can operate during maintenance
- **Health Check Integration**: Maintenance status included in health checks
- **Audit Logging**: All maintenance mode changes are logged
## API Endpoints
### Get Maintenance Status
**GET** `/api/v1/maintenance`
Returns the current maintenance mode status.
**Response:**
```json
{
"enabled": false,
"enabled_at": "2024-12-20T10:30:00Z",
"enabled_by": "admin",
"reason": "System update",
"allowed_users": ["admin"],
"last_backup_id": "backup-1703123456"
}
```
### Enable Maintenance Mode
**POST** `/api/v1/maintenance`
Enables maintenance mode. Requires administrator role.
**Request Body:**
```json
{
"reason": "System update to v1.1.0",
"allowed_users": ["admin"],
"create_backup": true
}
```
**Fields:**
- `reason` (string, required): Reason for entering maintenance mode
- `allowed_users` (array of strings, optional): User IDs allowed to operate during maintenance
- `create_backup` (boolean, optional): Create automatic backup before entering maintenance
**Response:**
```json
{
"message": "maintenance mode enabled",
"status": {
"enabled": true,
"enabled_at": "2024-12-20T10:30:00Z",
"enabled_by": "admin",
"reason": "System update to v1.1.0",
"allowed_users": ["admin"],
"last_backup_id": "backup-1703123456"
},
"backup_id": "backup-1703123456"
}
```
### Disable Maintenance Mode
**POST** `/api/v1/maintenance/disable`
Disables maintenance mode. Requires administrator role.
**Response:**
```json
{
"message": "maintenance mode disabled"
}
```
## Usage Examples
### Enable Maintenance Mode with Backup
```bash
curl -X POST http://localhost:8080/api/v1/maintenance \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"reason": "System update to v1.1.0",
"allowed_users": ["admin"],
"create_backup": true
}'
```
### Check Maintenance Status
```bash
curl http://localhost:8080/api/v1/maintenance \
-H "Authorization: Bearer $TOKEN"
```
### Disable Maintenance Mode
```bash
curl -X POST http://localhost:8080/api/v1/maintenance/disable \
-H "Authorization: Bearer $TOKEN"
```
## Behavior
### When Maintenance Mode is Enabled
1. **Read Operations**: All GET requests continue to work normally
2. **Mutating Operations**: All POST, PUT, PATCH, DELETE requests are blocked
3. **Allowed Users**: Users in the `allowed_users` list can still perform operations
4. **Public Endpoints**: Public endpoints (login, health checks) continue to work
5. **Error Response**: Blocked operations return `503 Service Unavailable` with message:
```json
{
"code": "SERVICE_UNAVAILABLE",
"message": "system is in maintenance mode",
"details": "the system is currently in maintenance mode and user operations are disabled"
}
```
### Middleware Order
Maintenance mode middleware is applied after authentication but before routes:
1. CORS
2. Compression
3. Security headers
4. Request size limit
5. Content-Type validation
6. Rate limiting
7. Caching
8. Error recovery
9. Request ID
10. Logging
11. Audit
12. **Maintenance mode** ← Blocks operations
13. Authentication
14. Routes
## Health Check Integration
The health check endpoint (`/health`) includes maintenance mode status:
```json
{
"status": "maintenance",
"timestamp": "2024-12-20T10:30:00Z",
"checks": {
"zfs": "healthy",
"database": "healthy",
"smb": "healthy",
"nfs": "healthy",
"iscsi": "healthy",
"maintenance": "enabled"
}
}
```
When maintenance mode is enabled:
- Status may change from "healthy" to "maintenance"
- `checks.maintenance` will be "enabled"
## Automatic Backup
When `create_backup: true` is specified:
1. A backup is created automatically before entering maintenance
2. The backup ID is stored in maintenance status
3. The backup includes:
- All user accounts
- All SMB shares
- All NFS exports
- All iSCSI targets
- All snapshot policies
- System configuration
## Best Practices
### Before System Updates
1. **Create Backup**: Always enable `create_backup: true`
2. **Notify Users**: Inform users about maintenance window
3. **Allow Administrators**: Include admin users in `allowed_users`
4. **Document Reason**: Provide clear reason for maintenance
### During Maintenance
1. **Monitor Status**: Check `/api/v1/maintenance` periodically
2. **Verify Backup**: Confirm backup was created successfully
3. **Perform Updates**: Execute system updates or maintenance tasks
4. **Test Operations**: Verify system functionality
### After Maintenance
1. **Disable Maintenance**: Use `/api/v1/maintenance/disable`
2. **Verify Services**: Check all services are running
3. **Test Operations**: Verify normal operations work
4. **Review Logs**: Check audit logs for any issues
## Security Considerations
1. **Administrator Only**: Only administrators can enable/disable maintenance mode
2. **Audit Logging**: All maintenance mode changes are logged
3. **Allowed Users**: Only specified users can operate during maintenance
4. **Token Validation**: Maintenance mode respects authentication
## Error Handling
### Maintenance Mode Already Enabled
```json
{
"code": "INTERNAL_ERROR",
"message": "failed to enable maintenance mode",
"details": "maintenance mode is already enabled"
}
```
### Maintenance Mode Not Enabled
```json
{
"code": "INTERNAL_ERROR",
"message": "failed to disable maintenance mode",
"details": "maintenance mode is not enabled"
}
```
### Backup Creation Failure
If backup creation fails, maintenance mode is not enabled:
```json
{
"code": "INTERNAL_ERROR",
"message": "failed to create backup",
"details": "error details..."
}
```
## Integration with Update Process
### Recommended Update Workflow
1. **Enable Maintenance Mode**:
```bash
POST /api/v1/maintenance
{
"reason": "Updating to v1.1.0",
"allowed_users": ["admin"],
"create_backup": true
}
```
2. **Verify Backup**:
```bash
GET /api/v1/backups/{backup_id}
```
3. **Perform System Update**:
- Stop services if needed
- Update binaries/configurations
- Restart services
4. **Verify System Health**:
```bash
GET /health
```
5. **Disable Maintenance Mode**:
```bash
POST /api/v1/maintenance/disable
```
6. **Test Operations**:
- Verify normal operations work
- Check service status
- Review logs
## Limitations
1. **No Automatic Disable**: Maintenance mode must be manually disabled
2. **No Scheduled Maintenance**: Maintenance mode must be enabled manually
3. **No Maintenance History**: Only current status is available
4. **No Notifications**: No automatic notifications to users
## Future Enhancements
1. **Scheduled Maintenance**: Schedule maintenance windows
2. **Maintenance History**: Track maintenance mode history
3. **User Notifications**: Notify users when maintenance starts/ends
4. **Automatic Disable**: Auto-disable after specified duration
5. **Maintenance Templates**: Predefined maintenance scenarios
6. **Rollback Support**: Automatic rollback on update failure

View File

@@ -13,6 +13,7 @@ import (
"gitea.avt.data-center.id/othman.suseno/atlas/internal/backup"
"gitea.avt.data-center.id/othman.suseno/atlas/internal/db"
"gitea.avt.data-center.id/othman.suseno/atlas/internal/job"
"gitea.avt.data-center.id/othman.suseno/atlas/internal/maintenance"
"gitea.avt.data-center.id/othman.suseno/atlas/internal/metrics"
"gitea.avt.data-center.id/othman.suseno/atlas/internal/services"
"gitea.avt.data-center.id/othman.suseno/atlas/internal/snapshot"
@@ -28,26 +29,27 @@ type Config struct {
}
type App struct {
cfg Config
tmpl *template.Template
mux *http.ServeMux
zfs *zfs.Service
snapshotPolicy *snapshot.PolicyStore
jobManager *job.Manager
scheduler *snapshot.Scheduler
authService *auth.Service
userStore *auth.UserStore
auditStore *audit.Store
smbStore *storage.SMBStore
nfsStore *storage.NFSStore
iscsiStore *storage.ISCSIStore
database *db.DB // Optional database connection
smbService *services.SMBService
nfsService *services.NFSService
iscsiService *services.ISCSIService
metricsCollector *metrics.Collector
startTime time.Time
backupService *backup.Service
cfg Config
tmpl *template.Template
mux *http.ServeMux
zfs *zfs.Service
snapshotPolicy *snapshot.PolicyStore
jobManager *job.Manager
scheduler *snapshot.Scheduler
authService *auth.Service
userStore *auth.UserStore
auditStore *audit.Store
smbStore *storage.SMBStore
nfsStore *storage.NFSStore
iscsiStore *storage.ISCSIStore
database *db.DB // Optional database connection
smbService *services.SMBService
nfsService *services.NFSService
iscsiService *services.ISCSIService
metricsCollector *metrics.Collector
startTime time.Time
backupService *backup.Service
maintenanceService *maintenance.Service
}
func New(cfg Config) (*App, error) {
@@ -154,7 +156,8 @@ func (a *App) Router() http.Handler {
// 10. Logging
// 11. Audit
// 12. Authentication
// 13. Routes
// 13. Maintenance mode (blocks operations during maintenance)
// 14. Routes
return a.corsMiddleware(
a.compressionMiddleware(
a.securityHeadersMiddleware(
@@ -166,7 +169,9 @@ func (a *App) Router() http.Handler {
requestID(
logging(
a.auditMiddleware(
a.authMiddleware(a.mux),
a.maintenanceMiddleware(
a.authMiddleware(a.mux),
),
),
),
),

View File

@@ -198,6 +198,16 @@ func (a *App) handleHealthCheck(w http.ResponseWriter, r *http.Request) {
health.Checks["iscsi"] = "healthy"
}
// Check maintenance mode
if a.maintenanceService != nil && a.maintenanceService.IsEnabled() {
health.Checks["maintenance"] = "enabled"
if health.Status == "healthy" {
health.Status = "maintenance"
}
} else {
health.Checks["maintenance"] = "disabled"
}
// Set HTTP status based on health
statusCode := http.StatusOK
if health.Status == "unhealthy" {

View File

@@ -0,0 +1,162 @@
package httpapp
import (
"encoding/json"
"net/http"
"gitea.avt.data-center.id/othman.suseno/atlas/internal/backup"
"gitea.avt.data-center.id/othman.suseno/atlas/internal/errors"
"gitea.avt.data-center.id/othman.suseno/atlas/internal/models"
)
// handleGetMaintenanceStatus returns the current maintenance mode status
func (a *App) handleGetMaintenanceStatus(w http.ResponseWriter, r *http.Request) {
if a.maintenanceService == nil {
writeError(w, errors.NewAPIError(
errors.ErrCodeInternal,
"maintenance service not available",
http.StatusInternalServerError,
))
return
}
status := a.maintenanceService.GetStatus()
writeJSON(w, http.StatusOK, status)
}
// handleEnableMaintenance enables maintenance mode
func (a *App) handleEnableMaintenance(w http.ResponseWriter, r *http.Request) {
if a.maintenanceService == nil {
writeError(w, errors.NewAPIError(
errors.ErrCodeInternal,
"maintenance service not available",
http.StatusInternalServerError,
))
return
}
// Require administrator role
user, ok := getUserFromContext(r)
if !ok {
writeError(w, errors.NewAPIError(
errors.ErrCodeUnauthorized,
"authentication required",
http.StatusUnauthorized,
))
return
}
if user.Role != models.RoleAdministrator {
writeError(w, errors.NewAPIError(
errors.ErrCodeForbidden,
"administrator role required",
http.StatusForbidden,
))
return
}
var req struct {
Reason string `json:"reason"`
AllowedUsers []string `json:"allowed_users,omitempty"`
CreateBackup bool `json:"create_backup,omitempty"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
// Description is optional, so we'll continue even if body is empty
_ = err
}
// Create backup before entering maintenance if requested
var backupID string
if req.CreateBackup && a.backupService != nil {
// Collect all configuration data
backupData := backup.BackupData{
Users: a.userStore.List(),
SMBShares: a.smbStore.List(),
NFSExports: a.nfsStore.List(),
ISCSITargets: a.iscsiStore.List(),
Policies: a.snapshotPolicy.List(),
Config: map[string]interface{}{
"database_path": a.cfg.DatabasePath,
},
}
id, err := a.backupService.CreateBackup(backupData, "Automatic backup before maintenance mode")
if err != nil {
writeError(w, errors.NewAPIError(
errors.ErrCodeInternal,
"failed to create backup",
http.StatusInternalServerError,
).WithDetails(err.Error()))
return
}
backupID = id
}
// Enable maintenance mode
if err := a.maintenanceService.Enable(user.ID, req.Reason, req.AllowedUsers); err != nil {
writeError(w, errors.NewAPIError(
errors.ErrCodeInternal,
"failed to enable maintenance mode",
http.StatusInternalServerError,
).WithDetails(err.Error()))
return
}
// Set backup ID if created
if backupID != "" {
a.maintenanceService.SetLastBackupID(backupID)
}
status := a.maintenanceService.GetStatus()
writeJSON(w, http.StatusOK, map[string]interface{}{
"message": "maintenance mode enabled",
"status": status,
"backup_id": backupID,
})
}
// handleDisableMaintenance disables maintenance mode
func (a *App) handleDisableMaintenance(w http.ResponseWriter, r *http.Request) {
if a.maintenanceService == nil {
writeError(w, errors.NewAPIError(
errors.ErrCodeInternal,
"maintenance service not available",
http.StatusInternalServerError,
))
return
}
// Require administrator role
user, ok := getUserFromContext(r)
if !ok {
writeError(w, errors.NewAPIError(
errors.ErrCodeUnauthorized,
"authentication required",
http.StatusUnauthorized,
))
return
}
if user.Role != models.RoleAdministrator {
writeError(w, errors.NewAPIError(
errors.ErrCodeForbidden,
"administrator role required",
http.StatusForbidden,
))
return
}
if err := a.maintenanceService.Disable(user.ID); err != nil {
writeError(w, errors.NewAPIError(
errors.ErrCodeInternal,
"failed to disable maintenance mode",
http.StatusInternalServerError,
).WithDetails(err.Error()))
return
}
writeJSON(w, http.StatusOK, map[string]string{
"message": "maintenance mode disabled",
})
}

View File

@@ -0,0 +1,39 @@
package httpapp
import (
"net/http"
"gitea.avt.data-center.id/othman.suseno/atlas/internal/errors"
)
// maintenanceMiddleware blocks operations during maintenance mode
func (a *App) maintenanceMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Skip maintenance check for read-only operations and public endpoints
if r.Method == http.MethodGet || r.Method == http.MethodHead || r.Method == http.MethodOptions {
next.ServeHTTP(w, r)
return
}
if a.isPublicEndpoint(r.URL.Path) {
next.ServeHTTP(w, r)
return
}
// Check if maintenance mode is enabled
if a.maintenanceService != nil && a.maintenanceService.IsEnabled() {
// Check if user is allowed during maintenance
user, ok := getUserFromContext(r)
if !ok || !a.maintenanceService.IsUserAllowed(user.ID) {
writeError(w, errors.NewAPIError(
errors.ErrCodeServiceUnavailable,
"system is in maintenance mode",
http.StatusServiceUnavailable,
).WithDetails("the system is currently in maintenance mode and user operations are disabled"))
return
}
}
next.ServeHTTP(w, r)
})
}

View File

@@ -34,6 +34,24 @@ func (a *App) routes() {
nil, nil, nil,
))
// Maintenance Mode (requires authentication, admin-only for enable/disable)
a.mux.HandleFunc("/api/v1/maintenance", methodHandler(
func(w http.ResponseWriter, r *http.Request) { a.handleGetMaintenanceStatus(w, r) },
func(w http.ResponseWriter, r *http.Request) {
adminRole := models.RoleAdministrator
a.requireRole(adminRole)(http.HandlerFunc(a.handleEnableMaintenance)).ServeHTTP(w, r)
},
nil, nil, nil,
))
a.mux.HandleFunc("/api/v1/maintenance/disable", methodHandler(
nil,
func(w http.ResponseWriter, r *http.Request) {
adminRole := models.RoleAdministrator
a.requireRole(adminRole)(http.HandlerFunc(a.handleDisableMaintenance)).ServeHTTP(w, r)
},
nil, nil, nil,
))
// API Documentation
a.mux.HandleFunc("/api/docs", a.handleAPIDocs)
a.mux.HandleFunc("/api/openapi.yaml", a.handleOpenAPISpec)

View File

@@ -0,0 +1,130 @@
package maintenance
import (
"fmt"
"sync"
"time"
)
// Mode represents the maintenance mode state
type Mode struct {
mu sync.RWMutex
enabled bool
enabledAt time.Time
enabledBy string
reason string
allowedUsers []string // Users allowed to operate during maintenance
lastBackupID string // ID of backup created before entering maintenance
}
// Service manages maintenance mode
type Service struct {
mode *Mode
}
// NewService creates a new maintenance service
func NewService() *Service {
return &Service{
mode: &Mode{
allowedUsers: []string{},
},
}
}
// IsEnabled returns whether maintenance mode is currently enabled
func (s *Service) IsEnabled() bool {
s.mode.mu.RLock()
defer s.mode.mu.RUnlock()
return s.mode.enabled
}
// Enable enables maintenance mode
func (s *Service) Enable(enabledBy, reason string, allowedUsers []string) error {
s.mode.mu.Lock()
defer s.mode.mu.Unlock()
if s.mode.enabled {
return fmt.Errorf("maintenance mode is already enabled")
}
s.mode.enabled = true
s.mode.enabledAt = time.Now()
s.mode.enabledBy = enabledBy
s.mode.reason = reason
if allowedUsers != nil {
s.mode.allowedUsers = allowedUsers
} else {
s.mode.allowedUsers = []string{}
}
return nil
}
// Disable disables maintenance mode
func (s *Service) Disable(disabledBy string) error {
s.mode.mu.Lock()
defer s.mode.mu.Unlock()
if !s.mode.enabled {
return fmt.Errorf("maintenance mode is not enabled")
}
s.mode.enabled = false
s.mode.enabledBy = ""
s.mode.reason = ""
s.mode.allowedUsers = []string{}
s.mode.lastBackupID = ""
return nil
}
// GetStatus returns the current maintenance mode status
func (s *Service) GetStatus() Status {
s.mode.mu.RLock()
defer s.mode.mu.RUnlock()
return Status{
Enabled: s.mode.enabled,
EnabledAt: s.mode.enabledAt,
EnabledBy: s.mode.enabledBy,
Reason: s.mode.reason,
AllowedUsers: s.mode.allowedUsers,
LastBackupID: s.mode.lastBackupID,
}
}
// SetLastBackupID sets the backup ID created before entering maintenance
func (s *Service) SetLastBackupID(backupID string) {
s.mode.mu.Lock()
defer s.mode.mu.Unlock()
s.mode.lastBackupID = backupID
}
// IsUserAllowed checks if a user is allowed to operate during maintenance
func (s *Service) IsUserAllowed(userID string) bool {
s.mode.mu.RLock()
defer s.mode.mu.RUnlock()
if !s.mode.enabled {
return true // No restrictions when not in maintenance
}
// Check if user is in allowed list
for _, allowed := range s.mode.allowedUsers {
if allowed == userID {
return true
}
}
return false
}
// Status represents the maintenance mode status
type Status struct {
Enabled bool `json:"enabled"`
EnabledAt time.Time `json:"enabled_at,omitempty"`
EnabledBy string `json:"enabled_by,omitempty"`
Reason string `json:"reason,omitempty"`
AllowedUsers []string `json:"allowed_users,omitempty"`
LastBackupID string `json:"last_backup_id,omitempty"`
}