This commit is contained in:
125
docs/BACKGROUND_JOBS.md
Normal file
125
docs/BACKGROUND_JOBS.md
Normal file
@@ -0,0 +1,125 @@
|
||||
# Background Job System
|
||||
|
||||
The atlasOS API includes a background job system that automatically executes snapshot policies and manages long-running operations.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
|
||||
1. **Job Manager** (`internal/job/manager.go`)
|
||||
- Tracks job lifecycle (pending, running, completed, failed, cancelled)
|
||||
- Stores job metadata and progress
|
||||
- Thread-safe job operations
|
||||
|
||||
2. **Snapshot Scheduler** (`internal/snapshot/scheduler.go`)
|
||||
- Automatically creates snapshots based on policies
|
||||
- Prunes old snapshots based on retention rules
|
||||
- Runs every 15 minutes by default
|
||||
|
||||
3. **Integration**
|
||||
- Scheduler starts automatically when API server starts
|
||||
- Gracefully stops on server shutdown
|
||||
- Jobs are accessible via API endpoints
|
||||
|
||||
## How It Works
|
||||
|
||||
### Snapshot Creation
|
||||
|
||||
The scheduler checks all enabled snapshot policies every 15 minutes and:
|
||||
|
||||
1. **Frequent snapshots**: Creates every 15 minutes if `frequent > 0`
|
||||
2. **Hourly snapshots**: Creates every hour if `hourly > 0`
|
||||
3. **Daily snapshots**: Creates daily if `daily > 0`
|
||||
4. **Weekly snapshots**: Creates weekly if `weekly > 0`
|
||||
5. **Monthly snapshots**: Creates monthly if `monthly > 0`
|
||||
6. **Yearly snapshots**: Creates yearly if `yearly > 0`
|
||||
|
||||
Snapshot names follow the pattern: `{type}-{timestamp}` (e.g., `hourly-20241214-143000`)
|
||||
|
||||
### Snapshot Pruning
|
||||
|
||||
When `autoprune` is enabled, the scheduler:
|
||||
|
||||
1. Groups snapshots by type (frequent, hourly, daily, etc.)
|
||||
2. Sorts by creation time (newest first)
|
||||
3. Keeps only the number specified in the policy
|
||||
4. Deletes older snapshots that exceed the retention count
|
||||
|
||||
### Job Tracking
|
||||
|
||||
Every snapshot operation creates a job that tracks:
|
||||
- Status (pending → running → completed/failed)
|
||||
- Progress (0-100%)
|
||||
- Error messages (if failed)
|
||||
- Timestamps (created, started, completed)
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### List Jobs
|
||||
```bash
|
||||
GET /api/v1/jobs
|
||||
GET /api/v1/jobs?status=running
|
||||
```
|
||||
|
||||
### Get Job
|
||||
```bash
|
||||
GET /api/v1/jobs/{id}
|
||||
```
|
||||
|
||||
### Cancel Job
|
||||
```bash
|
||||
POST /api/v1/jobs/{id}/cancel
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
The scheduler interval is hardcoded to 15 minutes. To change it, modify:
|
||||
|
||||
```go
|
||||
// In internal/httpapp/app.go
|
||||
scheduler.Start(15 * time.Minute) // Change interval here
|
||||
```
|
||||
|
||||
## Example Workflow
|
||||
|
||||
1. **Create a snapshot policy:**
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/v1/snapshot-policies \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"dataset": "pool/dataset",
|
||||
"hourly": 24,
|
||||
"daily": 7,
|
||||
"autosnap": true,
|
||||
"autoprune": true
|
||||
}'
|
||||
```
|
||||
|
||||
2. **Scheduler automatically:**
|
||||
- Creates hourly snapshots (keeps 24)
|
||||
- Creates daily snapshots (keeps 7)
|
||||
- Prunes old snapshots beyond retention
|
||||
|
||||
3. **Monitor jobs:**
|
||||
```bash
|
||||
curl http://localhost:8080/api/v1/jobs
|
||||
```
|
||||
|
||||
## Job Statuses
|
||||
|
||||
- `pending`: Job created but not started
|
||||
- `running`: Job is currently executing
|
||||
- `completed`: Job finished successfully
|
||||
- `failed`: Job encountered an error
|
||||
- `cancelled`: Job was cancelled by user
|
||||
|
||||
## Notes
|
||||
|
||||
- Jobs are stored in-memory (will be lost on restart)
|
||||
- Scheduler runs in a background goroutine
|
||||
- Snapshot operations are synchronous (blocking)
|
||||
- For production, consider:
|
||||
- Database persistence for jobs
|
||||
- Async job execution with worker pool
|
||||
- Job history retention policies
|
||||
- Metrics/alerting for failed jobs
|
||||
Reference in New Issue
Block a user