Files
hephaestus-hpc-api/README.md
Othman H. Suseno 43d842311f add readme file
2025-12-30 13:11:26 +07:00

5.4 KiB

Hypervisor Control Plane (HCP)

Provider-Agnostic Compute API Gateway

Hypervisor Control Plane (HCP) adalah compute control-plane service yang menyediakan API provider-agnostic untuk manajemen lifecycle Virtual Machine (VM) di berbagai hypervisor, termasuk:

  • Proxmox VE
  • VMware vSphere / ESXi
  • KVM / QEMU (libvirt)
  • Microsoft Hyper-V

HCP berfungsi sebagai jembatan antara Central API Gateway / Management Console dan infrastructure hypervisor, dengan pendekatan desired state + asynchronous job orchestration.


🎯 Tujuan Utama

  • Menyediakan API compute yang konsisten lintas hypervisor
  • Mendukung multi-tenant & multi-project isolation
  • Menyediakan enterprise-grade control plane
    • async job
    • idempotency
    • retry & reconciliation
    • auditability
  • Menjadi fondasi jangka panjang untuk ekspansi fitur compute tanpa rewrite

🧠 Konsep Arsitektur

HCP bukan sekadar reverse-proxy ke API hypervisor.

HCP adalah stateful control plane yang:

  • Menyimpan desired state VM
  • Mengelola lifecycle via job system
  • Mengabstraksi perbedaan hypervisor melalui provider adapter
  • Menghasilkan audit & metering events

Control Plane vs Data Plane

  • Control Plane: HCP API + Job orchestration + State
  • Data Plane: Hypervisor & provider-specific execution (via adapter/agent)

🧩 Komponen Utama

1. HCP API Service

  • Northbound REST API (tenant & ops)
  • AuthN/AuthZ enforcement (JWT + RBAC)
  • Validasi request & idempotency
  • Persist state VM & job
  • Publish job ke queue

2. HCP Worker

  • Consume job dari queue
  • Jalankan workflow state machine
  • Panggil provider adapter
  • Update state VM & job
  • Emit audit & metering events

3. Provider Adapter Layer

  • Implementasi driver per hypervisor
  • Mapping:
    • generic VM spec → API provider
    • error provider → error taxonomy HCP
  • Tidak mengekspos detail provider ke API northbound

4. Datastore & Queue

  • PostgreSQL: VM state, job state, catalog, placement
  • Queue/Stream: NATS JetStream / RabbitMQ
  • Audit/Event Store: append-only

🔌 Provider-Agnostic Design

Northbound API (Stable)

API HCP tidak pernah mengekspos:

  • node / host hypervisor
  • vmid / moid / UUID provider
  • datastore / resource pool

Tenant hanya berinteraksi dengan:

  • image
  • flavor
  • network attachment
  • placement (zone/location)
  • metadata/tags

Southbound Providers (Pluggable)

Setiap hypervisor diintegrasikan melalui provider adapter dengan kontrak yang konsisten.


🔁 Job & Workflow Model

Semua operasi yang berdampak ke infra dijalankan sebagai async job.

Contoh:

  • create VM
  • start / stop / reboot
  • delete VM
  • request console

Pola Standar

  1. API request diterima
  2. Desired state disimpan (PENDING)
  3. Job dibuat & dipublish
  4. Worker mengeksekusi via provider
  5. State diperbarui (ACTIVE / ERROR)
  6. Audit & event di-emit

API akan mengembalikan:

202 Accepted
{
  "resource_id": "...",
  "job_id": "..."
}

🌐 API Namespace

Disarankan dipanggil via Central API Gateway, namun HCP tetap melakukan guardrail sendiri.

Tenant

/api/hcp/tenant/v1/...

Operations / Provider

/api/hcp/ops/v1/...

Common (read-only)

/api/hcp/common/v1/...

🧪 Contoh Operasi Utama

Create VM

POST /api/hcp/tenant/v1/projects/{projectId}/vms

Response:

202 Accepted
{
  "vm_id": "...",
  "job_id": "..."
}

Get Job Status

GET /api/hcp/tenant/v1/jobs/{jobId}

Request Console

POST /api/hcp/tenant/v1/projects/{projectId}/vms/{vmId}:console

Response:

{
  "type": "vnc | spice | web | rdp",
  "url": "...",
  "expires_at": "..."
}

🧠 Capability Negotiation

Setiap provider/cluster memiliki capability flags, contoh:

  • supports_cloud_init
  • supports_snapshot
  • supports_live_migration
  • supports_console_vnc
  • supports_gpu_passthrough

Jika fitur tidak tersedia, API mengembalikan:

409 FEATURE_NOT_SUPPORTED

🔐 Security Model

  • JWT-based authentication
  • RBAC enforcement di level API
  • Strict tenant vs ops boundary
  • Provider credential terenkripsi
  • Console session short-lived
  • Semua aksi tercatat di audit log

📊 Observability

HCP menyediakan:

  • Metrics: request rate, latency, job success/failure
  • Structured logs: trace_id, job_id, vm_id
  • Distributed tracing (OpenTelemetry-ready)

🗂️ Repository Structure (Suggested)

hcp/
├── cmd/
│   ├── api/
│   └── worker/
├── internal/
│   ├── api/
│   ├── auth/
│   ├── jobs/
│   ├── providers/
│   │   ├── proxmox/
│   │   ├── vsphere/
│   │   ├── libvirt/
│   │   └── hyperv/
│   ├── reconcile/
│   └── audit/
├── pkg/
│   └── models/
├── docs/
│   ├── HCP_SRS_v1.md
│   └── HCP_SDS_v1.md
└── README.md

📄 Documentation

  • SRS: docs/HCP_SRS_v1.md
  • SDS: docs/HCP_SDS_v1.md

Dokumen tersebut adalah authoritative reference untuk desain dan implementasi HCP.


🚀 Roadmap Singkat

V1

  • Proxmox provider
  • VM lifecycle
  • Async job & audit
  • Console abstraction

Post-V1

  • vSphere provider
  • Libvirt/KVM provider
  • Hyper-V provider
  • Snapshot & migration
  • Policy-based placement

🧭 Philosophy

HCP is not built to support one hypervisor.
HCP is built so that hypervisors can come and go.