Files
atlas/docs/BACKGROUND_JOBS.md
othman.suseno ed96137bad
Some checks failed
CI / test-build (push) Failing after 1m0s
adding snapshot function
2025-12-14 23:17:26 +07:00

3.2 KiB

Background Job System

The atlasOS API includes a background job system that automatically executes snapshot policies and manages long-running operations.

Architecture

Components

  1. Job Manager (internal/job/manager.go)

    • Tracks job lifecycle (pending, running, completed, failed, cancelled)
    • Stores job metadata and progress
    • Thread-safe job operations
  2. Snapshot Scheduler (internal/snapshot/scheduler.go)

    • Automatically creates snapshots based on policies
    • Prunes old snapshots based on retention rules
    • Runs every 15 minutes by default
  3. Integration

    • Scheduler starts automatically when API server starts
    • Gracefully stops on server shutdown
    • Jobs are accessible via API endpoints

How It Works

Snapshot Creation

The scheduler checks all enabled snapshot policies every 15 minutes and:

  1. Frequent snapshots: Creates every 15 minutes if frequent > 0
  2. Hourly snapshots: Creates every hour if hourly > 0
  3. Daily snapshots: Creates daily if daily > 0
  4. Weekly snapshots: Creates weekly if weekly > 0
  5. Monthly snapshots: Creates monthly if monthly > 0
  6. Yearly snapshots: Creates yearly if yearly > 0

Snapshot names follow the pattern: {type}-{timestamp} (e.g., hourly-20241214-143000)

Snapshot Pruning

When autoprune is enabled, the scheduler:

  1. Groups snapshots by type (frequent, hourly, daily, etc.)
  2. Sorts by creation time (newest first)
  3. Keeps only the number specified in the policy
  4. Deletes older snapshots that exceed the retention count

Job Tracking

Every snapshot operation creates a job that tracks:

  • Status (pending → running → completed/failed)
  • Progress (0-100%)
  • Error messages (if failed)
  • Timestamps (created, started, completed)

API Endpoints

List Jobs

GET /api/v1/jobs
GET /api/v1/jobs?status=running

Get Job

GET /api/v1/jobs/{id}

Cancel Job

POST /api/v1/jobs/{id}/cancel

Configuration

The scheduler interval is hardcoded to 15 minutes. To change it, modify:

// In internal/httpapp/app.go
scheduler.Start(15 * time.Minute)  // Change interval here

Example Workflow

  1. Create a snapshot policy:
curl -X POST http://localhost:8080/api/v1/snapshot-policies \
  -H "Content-Type: application/json" \
  -d '{
    "dataset": "pool/dataset",
    "hourly": 24,
    "daily": 7,
    "autosnap": true,
    "autoprune": true
  }'
  1. Scheduler automatically:

    • Creates hourly snapshots (keeps 24)
    • Creates daily snapshots (keeps 7)
    • Prunes old snapshots beyond retention
  2. Monitor jobs:

curl http://localhost:8080/api/v1/jobs

Job Statuses

  • pending: Job created but not started
  • running: Job is currently executing
  • completed: Job finished successfully
  • failed: Job encountered an error
  • cancelled: Job was cancelled by user

Notes

  • Jobs are stored in-memory (will be lost on restart)
  • Scheduler runs in a background goroutine
  • Snapshot operations are synchronous (blocking)
  • For production, consider:
    • Database persistence for jobs
    • Async job execution with worker pool
    • Job history retention policies
    • Metrics/alerting for failed jobs