6.4 KiB
6.4 KiB
Hypervisor Control Plane (HCP)
Software Design Specification (SDS)
Version: 1.0 (V1 – Enterprise Foundation)
1. Overview Arsitektur
HCP menerapkan pola Control Plane compute dengan desain:
- Northbound API: Stabil, provider-agnostic, digunakan oleh Central Gateway dan console.
- Core: Orkestrasi job, policy hook, persistence state, audit/event.
- Southbound Provider Layer: Adapter/driver per hypervisor/provider.
- Workers/Agents: Mengeksekusi job yang berdampak pada infrastruktur.
HCP mendukung dua mode eksekusi:
- Direct mode: worker memanggil API provider langsung (cepat untuk bootstrap)
- Agent mode: job dikirim ke agent dekat cluster (lebih enterprise: multi-site, firewall-friendly)
2. Komponen Utama
2.1 HCP API Service
Tanggung jawab:
- Expose REST API (tenant/ops) untuk compute
- AuthN/AuthZ enforcement (scope-based)
- Validasi request + idempotency
- Persist desired state dan job record
- Publish job ke queue/stream
2.2 HCP Worker Service
Tanggung jawab:
- Subscribe job queue
- Jalankan state machine job (RUNNING/RETRY/FAILED/SUCCEEDED)
- Panggil provider adapter
- Update state VM/job di datastore
- Emit audit + metering events
2.3 Provider Adapter Layer
Tanggung jawab:
- Implement kontrak provider generik
- Mapping spec generik → API spesifik provider
- Normalisasi error provider → error taxonomy HCP
- Normalisasi VM actual state → model internal
2.4 Data Store
- PostgreSQL untuk state persisten (tenancy binding, vms, jobs, catalog, locations)
- Event store (append-only) untuk audit (bisa table khusus atau log pipeline)
- Queue/Stream untuk distribusi job (NATS JetStream / RabbitMQ)
3. Domain Model
3.1 Entities Inti
Location/ZoneProviderComputeCluster(pool/cluster per provider, terikat location)Image(catalog)Flavor(catalog)VMJobAuditEvent
3.2 Resource Fields (VM)
VM minimal memiliki:
id,org_id,project_idname,statusimage_id,flavor_idplacement(location_id, cluster_id optional)addresses(read-model)labels/tagsprovider_idprovider_ref(opaque/internal)- timestamps
3.3 Job Fields
id,typeresource_type,resource_idstate(PENDING/RUNNING/SUCCEEDED/FAILED/RETRYING)attempt,max_attempterror_code,error_message- timestamps
4. Northbound API Design
4.1 Namespace
Disarankan memisahkan tenant dan ops, meskipun HCP bisa diakses via Central Gateway:
- Tenant:
/api/hcp/tenant/v1/* - Ops:
/api/hcp/ops/v1/* - Common catalog (read):
/api/hcp/common/v1/*
4.2 Async Pattern
- Create/modify/delete mengembalikan
202 Accepteddenganjob_id. - Status job dapat dipolling:
GET /jobs/{job_id}. - Resource dapat dipolling:
GET /vms/{id}.
4.3 Core Endpoints (V1)
Tenant:
POST /projects/{projectId}/vmsGET /projects/{projectId}/vmsGET /projects/{projectId}/vms/{vmId}POST /projects/{projectId}/vms/{vmId}:startPOST /projects/{projectId}/vms/{vmId}:stopPOST /projects/{projectId}/vms/{vmId}:rebootDELETE /projects/{projectId}/vms/{vmId}POST /projects/{projectId}/vms/{vmId}:consoleGET /jobs/{jobId}
Ops:
POST /providersPOST /locationsPOST /compute-clustersGET /providersGET /compute-clustersPOST /catalog/images(opsional v1 jika platform membutuhkan)POST /catalog/flavors(opsional v1)
Common:
GET /catalog/imagesGET /catalog/flavorsGET /capabilities
5. Capability Model
HCP menyimpan dan mengekspos capability flags pada provider/cluster, contoh:
supports_cloud_initsupports_snapshotsupports_live_migrationsupports_console_vncsupports_console_spicesupports_uefisupports_gpu_passthroughsupports_secure_bootsupports_tags
Pemakaian:
- UI dan service upstream dapat menyesuaikan fitur yang ditampilkan.
- API mengembalikan
FEATURE_NOT_SUPPORTEDjika action tidak tersedia.
6. Provider Interface (Conceptual)
6.1 ComputeProvider (minimum)
CreateVM(spec) -> ProviderRefDeleteVM(ref)StartVM(ref)StopVM(ref)RebootVM(ref)GetVM(ref) -> ActualStateListVMs(scope) -> []ActualState(opsional untuk reconcile)
6.2 ConsoleProvider
GetConsole(ref) -> ConsoleSession(type, url/token, expires_at)
6.3 Catalog Providers (opsional)
ListImages(scope)ImportImage(source)DeleteImage(id)
ProviderRef bersifat opaque:
provider+external_id+location_id+extra(json)
7. Job & Workflow
7.1 Job Types (V1)
provision_vmstart_vmstop_vmreboot_vmdelete_vm(attach nic/volume bisa ditambahkan jika masuk scope platform v1)
7.2 State Machine
- PENDING → RUNNING → SUCCEEDED
- PENDING → RUNNING → FAILED
- PENDING → RUNNING → RETRYING → RUNNING ...
Retry hanya untuk error transient:
- provider timeout
- temporary network error
- 5xx upstream
Idempotency:
- create VM harus aman jika dieksekusi ulang.
- handler wajib memeriksa
provider_refdan actual state sebelum membuat resource baru.
8. Reconciliation Loop
Reconciliation dijalankan periodik untuk:
- Mengupdate VM yang
PENDING/RUNNINGberdasarkan actual state provider. - Mendeteksi drift: VM hilang di provider namun masih ACTIVE di DB.
- Menandai incident/alert (post-V1 bisa integrasi incident service).
9. Security Design
9.1 AuthN/AuthZ
- JWT bearer token dengan claims minimal:
org_id,project_bindings,roles,scopes - Tenant scope tidak boleh mengakses ops endpoints.
- Semua request harus divalidasi terhadap path param projectId.
9.2 Secrets & Credentials
- Provider credential disimpan terenkripsi.
- Worker/agent menggunakan credential scoped (per cluster/pool) jika memungkinkan.
- Console session menggunakan token sementara (short-lived).
10. Observability
- Metrics: RPS, latency, error rate, job success/fail, provider latency, retry count
- Logs terstruktur dengan
trace_id,job_id,vm_id - Tracing end-to-end (OpenTelemetry ready)
11. Deployment Notes (V1)
- HCP API: stateless, autoscale-ready
- HCP Worker: scale out sesuai throughput job
- DB: PostgreSQL
- Queue: NATS JetStream atau RabbitMQ
- Provider adapter: modul internal dalam worker (v1) atau sidecar/agent (enterprise mode)
12. Kompatibilitas Provider (Target)
- Proxmox driver sebagai implementasi pertama
- VMware vSphere driver (post-V1 atau parallel development)
- Libvirt/KVM driver
- Hyper-V driver