scrub operation + ZFS Pool CRUD
Some checks failed
CI / test-build (push) Failing after 2m14s

This commit is contained in:
2025-12-15 01:19:44 +07:00
parent 9779b30a65
commit abd8cef10a
9 changed files with 1124 additions and 63 deletions

297
docs/HTTPS_TLS.md Normal file
View File

@@ -0,0 +1,297 @@
# HTTPS/TLS Support
## Overview
AtlasOS supports HTTPS/TLS encryption for secure communication. TLS can be enabled via environment variables, and the system will automatically enforce HTTPS connections when TLS is enabled.
## Configuration
### Environment Variables
TLS is configured via environment variables:
- **`ATLAS_TLS_CERT`**: Path to TLS certificate file (PEM format)
- **`ATLAS_TLS_KEY`**: Path to TLS private key file (PEM format)
- **`ATLAS_TLS_ENABLED`**: Force enable TLS (optional, auto-enabled if cert/key provided)
### Automatic Detection
TLS is automatically enabled if both `ATLAS_TLS_CERT` and `ATLAS_TLS_KEY` are set:
```bash
export ATLAS_TLS_CERT=/etc/atlas/tls/cert.pem
export ATLAS_TLS_KEY=/etc/atlas/tls/key.pem
./atlas-api
```
### Explicit Enable
Force TLS even if cert/key are not set (will fail at startup if cert/key missing):
```bash
export ATLAS_TLS_ENABLED=true
export ATLAS_TLS_CERT=/etc/atlas/tls/cert.pem
export ATLAS_TLS_KEY=/etc/atlas/tls/key.pem
./atlas-api
```
## Certificate Requirements
### Certificate Format
- **Format**: PEM (Privacy-Enhanced Mail)
- **Certificate**: X.509 certificate
- **Key**: RSA or ECDSA private key
- **Chain**: Certificate chain can be included in cert file
### Certificate Validation
At startup, the system validates:
- Certificate file exists
- Key file exists
- Certificate and key are valid and match
- Certificate is not expired (checked by Go's TLS library)
## TLS Configuration
### Supported TLS Versions
- **Minimum**: TLS 1.2
- **Maximum**: TLS 1.3
### Cipher Suites
The system uses secure cipher suites:
- `TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384`
- `TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384`
- `TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305`
- `TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305`
- `TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256`
- `TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256`
### Elliptic Curves
Preferred curves:
- `CurveP256`
- `CurveP384`
- `CurveP521`
- `X25519`
## HTTPS Enforcement
### Automatic Redirect
When TLS is enabled, HTTP requests are automatically redirected to HTTPS:
```
HTTP Request → 301 Moved Permanently → HTTPS
```
### Exceptions
HTTPS enforcement is skipped for:
- **Health checks**: `/healthz`, `/health` (allows monitoring)
- **Localhost**: Requests from `localhost`, `127.0.0.1`, `::1` (development)
### Reverse Proxy Support
The system respects `X-Forwarded-Proto` header for reverse proxy setups:
```
X-Forwarded-Proto: https
```
## Usage Examples
### Development (HTTP)
```bash
# No TLS configuration - runs on HTTP
./atlas-api
```
### Production (HTTPS)
```bash
# Enable TLS
export ATLAS_TLS_CERT=/etc/ssl/certs/atlas.crt
export ATLAS_TLS_KEY=/etc/ssl/private/atlas.key
export ATLAS_HTTP_ADDR=:8443
./atlas-api
```
### Using Let's Encrypt
```bash
# Let's Encrypt certificates
export ATLAS_TLS_CERT=/etc/letsencrypt/live/atlas.example.com/fullchain.pem
export ATLAS_TLS_KEY=/etc/letsencrypt/live/atlas.example.com/privkey.pem
./atlas-api
```
### Self-Signed Certificate (Testing)
Generate a self-signed certificate:
```bash
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes
```
Use it:
```bash
export ATLAS_TLS_CERT=./cert.pem
export ATLAS_TLS_KEY=./key.pem
./atlas-api
```
## Security Headers
When TLS is enabled, additional security headers are set:
### HSTS (HTTP Strict Transport Security)
```
Strict-Transport-Security: max-age=31536000; includeSubDomains
```
- **Max Age**: 1 year (31536000 seconds)
- **Include Subdomains**: Yes
- **Purpose**: Forces browsers to use HTTPS
### Content Security Policy
CSP is configured to work with HTTPS:
```
Content-Security-Policy: default-src 'self'; ...
```
## Reverse Proxy Setup
### Nginx
```nginx
server {
listen 443 ssl;
server_name atlas.example.com;
ssl_certificate /etc/ssl/certs/atlas.crt;
ssl_certificate_key /etc/ssl/private/atlas.key;
location / {
proxy_pass http://localhost:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
```
### Apache
```apache
<VirtualHost *:443>
ServerName atlas.example.com
SSLEngine on
SSLCertificateFile /etc/ssl/certs/atlas.crt
SSLCertificateKeyFile /etc/ssl/private/atlas.key
ProxyPass / http://localhost:8080/
ProxyPassReverse / http://localhost:8080/
RequestHeader set X-Forwarded-Proto "https"
</VirtualHost>
```
## Troubleshooting
### Certificate Not Found
```
TLS configuration error: TLS certificate file not found: /path/to/cert.pem
```
**Solution**: Verify certificate file path and permissions.
### Certificate/Key Mismatch
```
TLS configuration error: load TLS certificate: tls: private key does not match public key
```
**Solution**: Ensure certificate and key files match.
### Certificate Expired
```
TLS handshake error: x509: certificate has expired or is not yet valid
```
**Solution**: Renew certificate or use a valid certificate.
### Port Already in Use
```
listen tcp :8443: bind: address already in use
```
**Solution**: Change port or stop conflicting service.
## Best Practices
### 1. Use Valid Certificates
- **Production**: Use certificates from trusted CAs (Let's Encrypt, commercial CAs)
- **Development**: Self-signed certificates are acceptable
- **Testing**: Use test certificates with short expiration
### 2. Certificate Renewal
- **Monitor Expiration**: Set up alerts for certificate expiration
- **Auto-Renewal**: Use tools like `certbot` for Let's Encrypt
- **Graceful Reload**: Restart service after certificate renewal
### 3. Key Security
- **Permissions**: Restrict key file permissions (`chmod 600`)
- **Ownership**: Use dedicated user for key file
- **Storage**: Store keys securely, never commit to version control
### 4. TLS Configuration
- **Minimum Version**: TLS 1.2 or higher
- **Cipher Suites**: Use strong cipher suites only
- **HSTS**: Enable HSTS for production
### 5. Reverse Proxy
- **Terminate TLS**: Terminate TLS at reverse proxy for better performance
- **Forward Headers**: Forward `X-Forwarded-Proto` header
- **Health Checks**: Allow HTTP for health checks
## Compliance
### SRS Requirement
Per SRS section 5.3 Security:
- **HTTPS SHALL be enforced for the web UI** ✅
This implementation:
- ✅ Supports TLS/HTTPS
- ✅ Enforces HTTPS when TLS is enabled
- ✅ Provides secure cipher suites
- ✅ Includes HSTS headers
- ✅ Validates certificates
## Future Enhancements
1. **Certificate Auto-Renewal**: Automatic certificate renewal
2. **OCSP Stapling**: Online Certificate Status Protocol stapling
3. **Certificate Rotation**: Seamless certificate rotation
4. **TLS 1.4 Support**: Support for future TLS versions
5. **Client Certificate Authentication**: Mutual TLS (mTLS)
6. **Certificate Monitoring**: Certificate expiration monitoring

306
docs/ZFS_OPERATIONS.md Normal file
View File

@@ -0,0 +1,306 @@
# ZFS Operations
## Overview
AtlasOS provides comprehensive ZFS pool management including pool creation, import, export, scrubbing with progress monitoring, and health status reporting.
## Pool Operations
### List Pools
**GET** `/api/v1/pools`
Returns all ZFS pools.
**Response:**
```json
[
{
"name": "tank",
"status": "ONLINE",
"size": 1099511627776,
"allocated": 536870912000,
"free": 562641027776,
"health": "ONLINE",
"created_at": "2024-01-15T10:30:00Z"
}
]
```
### Get Pool
**GET** `/api/v1/pools/{name}`
Returns details for a specific pool.
### Create Pool
**POST** `/api/v1/pools`
Creates a new ZFS pool.
**Request Body:**
```json
{
"name": "tank",
"vdevs": ["sda", "sdb"],
"options": {
"ashift": "12"
}
}
```
### Destroy Pool
**DELETE** `/api/v1/pools/{name}`
Destroys a ZFS pool. **Warning**: This is a destructive operation.
## Pool Import/Export
### List Available Pools
**GET** `/api/v1/pools/available`
Lists pools that can be imported (pools that exist but are not currently imported).
**Response:**
```json
{
"pools": ["tank", "backup"]
}
```
### Import Pool
**POST** `/api/v1/pools/import`
Imports a ZFS pool.
**Request Body:**
```json
{
"name": "tank",
"options": {
"readonly": "on"
}
}
```
**Options:**
- `readonly`: Set pool to read-only mode (`on`/`off`)
- Other ZFS pool properties
**Response:**
```json
{
"message": "pool imported",
"name": "tank"
}
```
### Export Pool
**POST** `/api/v1/pools/{name}/export`
Exports a ZFS pool (makes it unavailable but preserves data).
**Request Body (optional):**
```json
{
"force": false
}
```
**Parameters:**
- `force` (boolean): Force export even if pool is in use
**Response:**
```json
{
"message": "pool exported",
"name": "tank"
}
```
## Scrub Operations
### Start Scrub
**POST** `/api/v1/pools/{name}/scrub`
Starts a scrub operation on a pool. Scrub verifies data integrity and repairs any errors found.
**Response:**
```json
{
"message": "scrub started",
"pool": "tank"
}
```
### Get Scrub Status
**GET** `/api/v1/pools/{name}/scrub`
Returns detailed scrub status with progress information.
**Response:**
```json
{
"status": "in_progress",
"progress": 45.2,
"time_elapsed": "2h15m",
"time_remain": "30m",
"speed": "100M/s",
"errors": 0,
"repaired": 0,
"last_scrub": "2024-12-15T10:30:00Z"
}
```
**Status Values:**
- `idle`: No scrub in progress
- `in_progress`: Scrub is currently running
- `completed`: Scrub completed successfully
- `error`: Scrub encountered errors
**Progress Fields:**
- `progress`: Percentage complete (0-100)
- `time_elapsed`: Time since scrub started
- `time_remain`: Estimated time remaining
- `speed`: Current scrub speed
- `errors`: Number of errors found
- `repaired`: Number of errors repaired
- `last_scrub`: Timestamp of last completed scrub
## Usage Examples
### Import a Pool
```bash
curl -X POST http://localhost:8080/api/v1/pools/import \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "tank"
}'
```
### Start Scrub and Monitor Progress
```bash
# Start scrub
curl -X POST http://localhost:8080/api/v1/pools/tank/scrub \
-H "Authorization: Bearer $TOKEN"
# Check progress
curl http://localhost:8080/api/v1/pools/tank/scrub \
-H "Authorization: Bearer $TOKEN"
```
### Export Pool
```bash
curl -X POST http://localhost:8080/api/v1/pools/tank/export \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"force": false
}'
```
## Scrub Best Practices
### When to Scrub
- **Regular Schedule**: Monthly or quarterly
- **After Disk Failures**: After replacing failed disks
- **Before Major Operations**: Before pool upgrades or migrations
- **After Data Corruption**: If data integrity issues are suspected
### Monitoring Scrub Progress
1. **Start Scrub**: Use POST endpoint to start
2. **Monitor Progress**: Poll GET endpoint every few minutes
3. **Check Errors**: Monitor `errors` and `repaired` fields
4. **Wait for Completion**: Wait until `status` is `completed`
### Scrub Performance
- **Impact**: Scrub operations can impact pool performance
- **Scheduling**: Schedule during low-usage periods
- **Duration**: Large pools may take hours or days
- **I/O**: Scrub generates significant I/O load
## Pool Import/Export Use Cases
### Import Use Cases
1. **System Reboot**: Pools are automatically imported on boot
2. **Manual Import**: Import pools that were exported
3. **Read-Only Import**: Import pool in read-only mode for inspection
4. **Recovery**: Import pools from backup systems
### Export Use Cases
1. **System Shutdown**: Export pools before shutdown
2. **Maintenance**: Export pools for maintenance operations
3. **Migration**: Export pools before moving to another system
4. **Backup**: Export pools before creating full backups
## Error Handling
### Pool Not Found
```json
{
"code": "NOT_FOUND",
"message": "pool not found"
}
```
### Scrub Already Running
```json
{
"code": "CONFLICT",
"message": "scrub already in progress"
}
```
### Pool in Use (Export)
```json
{
"code": "CONFLICT",
"message": "pool is in use, cannot export"
}
```
Use `force: true` to force export (use with caution).
## Compliance with SRS
Per SRS section 4.2 ZFS Management:
-**List available disks**: Implemented
-**Create pools**: Implemented
-**Import pools**: Implemented (Priority 20)
-**Export pools**: Implemented (Priority 20)
-**Report pool health**: Implemented
-**Create and manage datasets**: Implemented
-**Create ZVOLs**: Implemented
-**Scrub operations**: Implemented
-**Progress monitoring**: Implemented (Priority 19)
## Future Enhancements
1. **Scrub Scheduling**: Automatic scheduled scrubs
2. **Scrub Notifications**: Alerts when scrub completes or finds errors
3. **Pool Health Alerts**: Automatic alerts for pool health issues
4. **Import History**: Track pool import/export history
5. **Pool Properties**: Manage pool properties via API
6. **VDEV Management**: Add/remove vdevs from pools
7. **Pool Upgrade**: Upgrade pool version
8. **Resilver Operations**: Monitor and manage resilver operations