126 lines
3.2 KiB
Markdown
126 lines
3.2 KiB
Markdown
# Background Job System
|
|
|
|
The AtlasOS API includes a background job system that automatically executes snapshot policies and manages long-running operations.
|
|
|
|
## Architecture
|
|
|
|
### Components
|
|
|
|
1. **Job Manager** (`internal/job/manager.go`)
|
|
- Tracks job lifecycle (pending, running, completed, failed, cancelled)
|
|
- Stores job metadata and progress
|
|
- Thread-safe job operations
|
|
|
|
2. **Snapshot Scheduler** (`internal/snapshot/scheduler.go`)
|
|
- Automatically creates snapshots based on policies
|
|
- Prunes old snapshots based on retention rules
|
|
- Runs every 15 minutes by default
|
|
|
|
3. **Integration**
|
|
- Scheduler starts automatically when API server starts
|
|
- Gracefully stops on server shutdown
|
|
- Jobs are accessible via API endpoints
|
|
|
|
## How It Works
|
|
|
|
### Snapshot Creation
|
|
|
|
The scheduler checks all enabled snapshot policies every 15 minutes and:
|
|
|
|
1. **Frequent snapshots**: Creates every 15 minutes if `frequent > 0`
|
|
2. **Hourly snapshots**: Creates every hour if `hourly > 0`
|
|
3. **Daily snapshots**: Creates daily if `daily > 0`
|
|
4. **Weekly snapshots**: Creates weekly if `weekly > 0`
|
|
5. **Monthly snapshots**: Creates monthly if `monthly > 0`
|
|
6. **Yearly snapshots**: Creates yearly if `yearly > 0`
|
|
|
|
Snapshot names follow the pattern: `{type}-{timestamp}` (e.g., `hourly-20241214-143000`)
|
|
|
|
### Snapshot Pruning
|
|
|
|
When `autoprune` is enabled, the scheduler:
|
|
|
|
1. Groups snapshots by type (frequent, hourly, daily, etc.)
|
|
2. Sorts by creation time (newest first)
|
|
3. Keeps only the number specified in the policy
|
|
4. Deletes older snapshots that exceed the retention count
|
|
|
|
### Job Tracking
|
|
|
|
Every snapshot operation creates a job that tracks:
|
|
- Status (pending → running → completed/failed)
|
|
- Progress (0-100%)
|
|
- Error messages (if failed)
|
|
- Timestamps (created, started, completed)
|
|
|
|
## API Endpoints
|
|
|
|
### List Jobs
|
|
```bash
|
|
GET /api/v1/jobs
|
|
GET /api/v1/jobs?status=running
|
|
```
|
|
|
|
### Get Job
|
|
```bash
|
|
GET /api/v1/jobs/{id}
|
|
```
|
|
|
|
### Cancel Job
|
|
```bash
|
|
POST /api/v1/jobs/{id}/cancel
|
|
```
|
|
|
|
## Configuration
|
|
|
|
The scheduler interval is hardcoded to 15 minutes. To change it, modify:
|
|
|
|
```go
|
|
// In internal/httpapp/app.go
|
|
scheduler.Start(15 * time.Minute) // Change interval here
|
|
```
|
|
|
|
## Example Workflow
|
|
|
|
1. **Create a snapshot policy:**
|
|
```bash
|
|
curl -X POST http://localhost:8080/api/v1/snapshot-policies \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"dataset": "pool/dataset",
|
|
"hourly": 24,
|
|
"daily": 7,
|
|
"autosnap": true,
|
|
"autoprune": true
|
|
}'
|
|
```
|
|
|
|
2. **Scheduler automatically:**
|
|
- Creates hourly snapshots (keeps 24)
|
|
- Creates daily snapshots (keeps 7)
|
|
- Prunes old snapshots beyond retention
|
|
|
|
3. **Monitor jobs:**
|
|
```bash
|
|
curl http://localhost:8080/api/v1/jobs
|
|
```
|
|
|
|
## Job Statuses
|
|
|
|
- `pending`: Job created but not started
|
|
- `running`: Job is currently executing
|
|
- `completed`: Job finished successfully
|
|
- `failed`: Job encountered an error
|
|
- `cancelled`: Job was cancelled by user
|
|
|
|
## Notes
|
|
|
|
- Jobs are stored in-memory (will be lost on restart)
|
|
- Scheduler runs in a background goroutine
|
|
- Snapshot operations are synchronous (blocking)
|
|
- For production, consider:
|
|
- Database persistence for jobs
|
|
- Async job execution with worker pool
|
|
- Job history retention policies
|
|
- Metrics/alerting for failed jobs
|