Files
hephaestus-hpc-api/README.md
Othman H. Suseno 43d842311f add readme file
2025-12-30 13:11:26 +07:00

275 lines
5.4 KiB
Markdown

# Hypervisor Control Plane (HCP)
## Provider-Agnostic Compute API Gateway
Hypervisor Control Plane (HCP) adalah **compute control-plane service** yang menyediakan API **provider-agnostic** untuk manajemen lifecycle Virtual Machine (VM) di berbagai hypervisor, termasuk:
- Proxmox VE
- VMware vSphere / ESXi
- KVM / QEMU (libvirt)
- Microsoft Hyper-V
HCP berfungsi sebagai **jembatan antara Central API Gateway / Management Console** dan **infrastructure hypervisor**, dengan pendekatan **desired state + asynchronous job orchestration**.
---
## 🎯 Tujuan Utama
- Menyediakan **API compute yang konsisten** lintas hypervisor
- Mendukung **multi-tenant & multi-project isolation**
- Menyediakan **enterprise-grade control plane**
- async job
- idempotency
- retry & reconciliation
- auditability
- Menjadi **fondasi jangka panjang** untuk ekspansi fitur compute tanpa rewrite
---
## 🧠 Konsep Arsitektur
HCP **bukan sekadar reverse-proxy ke API hypervisor**.
HCP adalah **stateful control plane** yang:
- Menyimpan *desired state* VM
- Mengelola lifecycle via job system
- Mengabstraksi perbedaan hypervisor melalui provider adapter
- Menghasilkan audit & metering events
### Control Plane vs Data Plane
- **Control Plane**: HCP API + Job orchestration + State
- **Data Plane**: Hypervisor & provider-specific execution (via adapter/agent)
---
## 🧩 Komponen Utama
### 1. HCP API Service
- Northbound REST API (tenant & ops)
- AuthN/AuthZ enforcement (JWT + RBAC)
- Validasi request & idempotency
- Persist state VM & job
- Publish job ke queue
### 2. HCP Worker
- Consume job dari queue
- Jalankan workflow state machine
- Panggil provider adapter
- Update state VM & job
- Emit audit & metering events
### 3. Provider Adapter Layer
- Implementasi driver per hypervisor
- Mapping:
- generic VM spec → API provider
- error provider → error taxonomy HCP
- Tidak mengekspos detail provider ke API northbound
### 4. Datastore & Queue
- **PostgreSQL**: VM state, job state, catalog, placement
- **Queue/Stream**: NATS JetStream / RabbitMQ
- **Audit/Event Store**: append-only
---
## 🔌 Provider-Agnostic Design
### Northbound API (Stable)
API HCP **tidak pernah** mengekspos:
- node / host hypervisor
- vmid / moid / UUID provider
- datastore / resource pool
Tenant hanya berinteraksi dengan:
- image
- flavor
- network attachment
- placement (zone/location)
- metadata/tags
### Southbound Providers (Pluggable)
Setiap hypervisor diintegrasikan melalui **provider adapter** dengan kontrak yang konsisten.
---
## 🔁 Job & Workflow Model
Semua operasi yang berdampak ke infra dijalankan sebagai **async job**.
Contoh:
- create VM
- start / stop / reboot
- delete VM
- request console
### Pola Standar
1. API request diterima
2. Desired state disimpan (`PENDING`)
3. Job dibuat & dipublish
4. Worker mengeksekusi via provider
5. State diperbarui (`ACTIVE / ERROR`)
6. Audit & event di-emit
API akan mengembalikan:
```
202 Accepted
{
"resource_id": "...",
"job_id": "..."
}
```
---
## 🌐 API Namespace
Disarankan dipanggil via Central API Gateway, namun HCP tetap melakukan guardrail sendiri.
### Tenant
```
/api/hcp/tenant/v1/...
```
### Operations / Provider
```
/api/hcp/ops/v1/...
```
### Common (read-only)
```
/api/hcp/common/v1/...
```
---
## 🧪 Contoh Operasi Utama
### Create VM
```
POST /api/hcp/tenant/v1/projects/{projectId}/vms
```
Response:
```
202 Accepted
{
"vm_id": "...",
"job_id": "..."
}
```
### Get Job Status
```
GET /api/hcp/tenant/v1/jobs/{jobId}
```
### Request Console
```
POST /api/hcp/tenant/v1/projects/{projectId}/vms/{vmId}:console
```
Response:
```
{
"type": "vnc | spice | web | rdp",
"url": "...",
"expires_at": "..."
}
```
---
## 🧠 Capability Negotiation
Setiap provider/cluster memiliki capability flags, contoh:
- supports_cloud_init
- supports_snapshot
- supports_live_migration
- supports_console_vnc
- supports_gpu_passthrough
Jika fitur tidak tersedia, API mengembalikan:
```
409 FEATURE_NOT_SUPPORTED
```
---
## 🔐 Security Model
- JWT-based authentication
- RBAC enforcement di level API
- Strict tenant vs ops boundary
- Provider credential terenkripsi
- Console session **short-lived**
- Semua aksi tercatat di audit log
---
## 📊 Observability
HCP menyediakan:
- Metrics: request rate, latency, job success/failure
- Structured logs: trace_id, job_id, vm_id
- Distributed tracing (OpenTelemetry-ready)
---
## 🗂️ Repository Structure (Suggested)
```
hcp/
├── cmd/
│ ├── api/
│ └── worker/
├── internal/
│ ├── api/
│ ├── auth/
│ ├── jobs/
│ ├── providers/
│ │ ├── proxmox/
│ │ ├── vsphere/
│ │ ├── libvirt/
│ │ └── hyperv/
│ ├── reconcile/
│ └── audit/
├── pkg/
│ └── models/
├── docs/
│ ├── HCP_SRS_v1.md
│ └── HCP_SDS_v1.md
└── README.md
```
---
## 📄 Documentation
- **SRS**: `docs/HCP_SRS_v1.md`
- **SDS**: `docs/HCP_SDS_v1.md`
Dokumen tersebut adalah **authoritative reference** untuk desain dan implementasi HCP.
---
## 🚀 Roadmap Singkat
**V1**
- Proxmox provider
- VM lifecycle
- Async job & audit
- Console abstraction
**Post-V1**
- vSphere provider
- Libvirt/KVM provider
- Hyper-V provider
- Snapshot & migration
- Policy-based placement
---
## 🧭 Philosophy
> HCP is not built to support one hypervisor.
> HCP is built so that hypervisors can come and go.