diff --git a/README.md b/README.md index 2c27108..846ea2c 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ **Example applications for [dstack](https://github.com/Dstack-TEE/dstack) - Deploy containerized apps to TEEs with end-to-end security in minutes** -[Getting Started](#getting-started) • [Use Cases](#use-cases) • [Core Patterns](#core-patterns) • [Dev Tools](#dev-scaffolding) • [Starter Packs](#starter-packs) • [Other Use Cases](#other-use-cases) +[Getting Started](#getting-started) • [Use Cases](#use-cases) • [Core Patterns](#core-patterns) • [Infrastructure](#infrastructure) • [Dev Tools](#dev-scaffolding) • [Starter Packs](#starter-packs) • [Other Use Cases](#other-use-cases) @@ -152,6 +152,16 @@ Development and debugging tools. **Not for production.** --- +## Infrastructure + +Run infrastructure services inside TEEs. + +| Example | Description | +|---------|-------------| +| [k3s](./k3s) | Single-node k3s cluster in a TEE with wildcard HTTPS and remote kubectl | + +--- + ## Tech Demos Interesting demonstrations. diff --git a/k3s/README.md b/k3s/README.md new file mode 100644 index 0000000..2ee4eef --- /dev/null +++ b/k3s/README.md @@ -0,0 +1,382 @@ +# k3s on dstack + +Run a single-node Kubernetes cluster inside an Intel TDX Confidential VM with wildcard HTTPS for all your services. + +## What You Get + +- k3s cluster running in a hardware-isolated TEE +- Wildcard TLS certificate (Let's Encrypt) so every service gets HTTPS automatically +- Remote `kubectl` access through the dstack gateway +- Traefik ingress controller for routing HTTP traffic to pods + +## Prerequisites + +- [Phala Cloud](https://cloud.phala.network) account (or a self-hosted dstack deployment — see [Running on Raw dstack](#running-on-raw-dstack)) +- Phala CLI installed and authenticated: + ```bash + npm install -g phala + phala auth login + ``` +- `kubectl` and `jq` installed ([kubectl install guide](https://kubernetes.io/docs/tasks/tools/)) +- A domain you control (for the wildcard certificate) +- Cloudflare API token with **Zone:Read** and **DNS:Edit** permissions (see [DNS_PROVIDERS.md](../custom-domain/dstack-ingress/DNS_PROVIDERS.md) for other DNS providers) + +## Quick Start + +The deploy script handles everything — CVM provisioning, kubeconfig extraction, certificate waiting, and a test workload: + +```bash +export CLOUDFLARE_API_TOKEN=your-cloudflare-token +export CERTBOT_EMAIL=you@example.com + +./deploy.sh k3s.example.com +``` + +Replace `k3s.example.com` with your actual domain. The script takes ~10 minutes (mostly waiting for the wildcard certificate). When done, it prints: + +``` +============================================ + k3s on dstack is ready! +============================================ + + Kubeconfig: export KUBECONFIG=/path/to/k3s.yaml + kubectl: kubectl get nodes + Test URL: https://nginx.k3s.example.com/ + Evidence: https://nginx.k3s.example.com/evidences/quote +``` + +You can then deploy your own services: + +```bash +export KUBECONFIG=k3s.yaml +kubectl run my-app --image=my-app:latest --port=8080 +kubectl expose pod my-app --port=8080 +kubectl apply -f - < +Click to expand manual steps + +### 1. Deploy the CVM + +```bash +phala deploy \ + -n my-k3s \ + -c docker-compose.yaml \ + -t tdx.medium \ + --disk-size 50G \ + --dev-os \ + -e "CLOUDFLARE_API_TOKEN=your-cloudflare-token" \ + -e "CERTBOT_EMAIL=you@example.com" \ + -e "CLUSTER_DOMAIN=k3s.example.com" \ + --wait +``` + +The `--dev-os` flag enables SSH access (needed to extract the kubeconfig). The `--disk-size 50G` gives enough room for k3s images and workloads. + +The deploy command outputs an **App ID** and gateway info. Save the App ID (a 40-character hex string): + +``` +App ID: a1b2c3d4e5f6... +``` + +You can also retrieve these later: + +```bash +phala cvms get my-k3s --json +``` + +### 2. Get Your Kubeconfig + +Wait 3-4 minutes for the CVM to boot, then extract the kubeconfig: + +```bash +APP_ID= +GATEWAY_DOMAIN= # e.g. dstack-pha-prod5.phala.network + +# Find the gateway domain from the CVM info if you don't have it +# phala cvms get my-k3s --json | jq -r '.gateway.base_domain' + +# Extract kubeconfig from the CVM +phala ssh "$APP_ID" -- \ + "docker exec dstack-k3s-1 cat /etc/rancher/k3s/k3s.yaml" \ + 2>/dev/null > k3s.yaml + +# Rewrite the API server URL to use the gateway TLS passthrough endpoint +sed -i "s|https://127.0.0.1:6443|https://${APP_ID}-6443s.${GATEWAY_DOMAIN}|" k3s.yaml + +export KUBECONFIG=$(pwd)/k3s.yaml +``` + +> **Note:** The `-6443s` suffix tells the dstack gateway to use TLS passthrough (the `s` means passthrough). This way `kubectl` talks directly to the k3s API server's TLS — the gateway never sees the traffic contents. + +### 3. Verify the Cluster + +```bash +kubectl get nodes +``` + +Wait until the node shows `Ready` (1-2 minutes after SSH becomes available): + +``` +NAME STATUS ROLES AGE VERSION +k3s-node Ready control-plane,master 2m v1.31.6+k3s1 +``` + +### 4. Wait for the Wildcard Certificate + +The dstack-ingress container issues a Let's Encrypt wildcard certificate via DNS-01 challenge. This takes 5-8 minutes after CVM boot (certbot installation + 120s DNS propagation + issuance). If your domain has existing CAA records, the first attempt may fail and auto-retry after setting the correct `issuewild` CAA record. + +Check with: + +```bash +curl -sI "https://test.k3s.example.com/" 2>&1 | head -3 +``` + +Retry until you see an HTTP response (a 404 is fine — it means TLS works but no service is routed yet): + +``` +HTTP/1.1 404 Not Found +``` + +### 5. Deploy a Test Workload + +```bash +CLUSTER_DOMAIN=k3s.example.com + +kubectl run nginx --image=nginx:alpine --port=80 +kubectl expose pod nginx --port=80 --target-port=80 --name=nginx +kubectl wait --for=condition=Ready pod/nginx --timeout=120s + +kubectl apply -f - < + +## How It Works + +### Architecture + +```mermaid +graph LR + subgraph Internet + User[Browser / curl] + Kubectl[kubectl] + end + + subgraph "dstack CVM (Intel TDX)" + Ingress[dstack-ingress
TLS termination] + Traefik[Traefik
HTTP routing] + K3sAPI[k3s API server
port 6443] + Pod1[Pod A] + Pod2[Pod B] + end + + User -->|HTTPS| Ingress + Ingress -->|HTTP| Traefik + Traefik --> Pod1 + Traefik --> Pod2 + + Kubectl -->|TLS passthrough
via gateway| K3sAPI +``` + +External HTTPS traffic hits dstack-ingress, which terminates TLS using the wildcard certificate and forwards plain HTTP to Traefik. Traefik routes requests to pods based on `Host` header matching via IngressRoutes. + +`kubectl` connects through the dstack gateway's TLS passthrough mode (port suffix `-6443s`), so the gateway forwards encrypted traffic directly to the k3s API server without inspecting it. + +### Services + +| Service | Purpose | +|---------|---------| +| `kmod-installer` | Loads kernel modules required by k3s networking (runs once at boot) | +| `k3s` | Single-node k3s server running in privileged mode | +| `dstack-ingress` | Wildcard TLS termination via Let's Encrypt DNS-01, proxies to Traefik | + +### How Wildcard HTTPS Works + +1. dstack-ingress requests a wildcard certificate for `*.k3s.example.com` from Let's Encrypt using DNS-01 validation +2. It creates a DNS TXT record via the Cloudflare API to prove domain ownership +3. After certificate issuance, all HTTPS traffic to `*.k3s.example.com` is terminated by dstack-ingress +4. The decrypted HTTP traffic is forwarded to Traefik on port 80 +5. Traefik matches the `Host` header against IngressRoute rules and routes to the right pod + +## Environment Variables + +| Variable | Required | Description | +|----------|----------|-------------| +| `CLOUDFLARE_API_TOKEN` | Yes | Cloudflare API token for DNS-01 certificate challenges | +| `CERTBOT_EMAIL` | Yes | Email for Let's Encrypt registration | +| `CLUSTER_DOMAIN` | Yes | Your domain for the wildcard cert (e.g., `k3s.example.com`) | +| `K3S_NODE_NAME` | No | Kubernetes node name (default: `k3s-node`) | + +The following are auto-injected by Phala Cloud and used in the compose file: + +| Variable | Description | +|----------|-------------| +| `DSTACK_APP_ID` | CVM application ID (used for k3s TLS SAN) | +| `DSTACK_GATEWAY_DOMAIN` | Gateway domain (used for k3s TLS SAN and ingress routing) | + +### Using a Different DNS Provider + +The compose file defaults to Cloudflare. To use a different provider, change `DNS_PROVIDER` and the corresponding credentials. See [DNS_PROVIDERS.md](../custom-domain/dstack-ingress/DNS_PROVIDERS.md) for supported providers (Linode, Namecheap, Route53). + +### Instance Sizing + +| Instance Type | Recommended For | +|---------------|-----------------| +| `tdx.medium` | Testing and tutorials | +| `tdx.4xlarge` | Small production workloads (5-10 pods) | +| `tdx.8xlarge` | Larger workloads | + +k3s itself uses ~500MB RAM. Budget the rest for your workloads. A 50GB disk is recommended minimum. + +## Hardening (Optional) + +### Scoped RBAC for Programmatic Access + +The kubeconfig from step 2 uses the built-in `cluster-admin` credentials. For programmatic access with limited permissions, create a scoped service account: + +```bash +kubectl apply -f manifests/rbac.yaml +kubectl create token k3s-admin --duration=8760h +``` + +Use the resulting token with the API server URL from your kubeconfig. + +### Network Policies + +Restrict pod-to-pod traffic so pods can only receive traffic from Traefik and reach the internet (but not each other): + +```bash +kubectl apply -f manifests/network-policy.yaml +``` + +This applies three policies: +- **default-deny**: blocks all ingress and egress by default +- **allow-internet-egress**: allows outbound internet (but blocks pod-to-pod via CIDR exclusions) and DNS +- **allow-traefik-ingress**: allows inbound traffic only from Traefik in kube-system + +## Running on Raw dstack + +This tutorial targets Phala Cloud, but the same compose file works on a self-hosted dstack deployment. Key differences: + +- `DSTACK_APP_ID` and `DSTACK_GATEWAY_DOMAIN` are not auto-injected. Add a `K3S_TLS_SAN` environment variable to the k3s service manually with your gateway endpoint, and update the `--tls-san` argument. +- Deploy via the VMM web UI (port 9080) or `vmm-cli.py` instead of the Phala CLI. +- See the [dstack deployment guide](https://github.com/Dstack-TEE/dstack/blob/main/docs/deployment.md) for self-hosted setup. + +## Timing Reference + +| Phase | Duration | +|-------|----------| +| CVM provision | ~1s | +| SSH available | ~2-3 min | +| k3s node Ready | ~1 min after SSH | +| Wildcard cert issued | ~5-8 min (certbot install + DNS propagation) | +| **Total to first HTTPS 200** | **~8-10 min** | + +Measured on `tdx.medium` with 50GB disk. The wildcard cert is the bottleneck — certbot installs its dependencies on first boot, then waits 120s for DNS propagation. If your domain has existing CAA records, add another ~3 min for the auto-retry. + +## Troubleshooting + +**kubectl connection refused** +The k3s API server may not be ready yet. Wait 3-5 minutes after deploy, then retry. Check that the `--tls-san` value matches your gateway endpoint. + +**Wildcard cert not issuing** +Check dstack-ingress logs: +```bash +phala ssh -- "docker logs dstack-dstack-ingress-1 2>&1 | tail -30" +``` +Common causes: wrong Cloudflare token, domain not on your Cloudflare account, DNS propagation delay, existing CAA records (auto-retried). + +**IngressRoute returns 404** +Traefik may not have picked up the route yet. Wait 10-15 seconds and retry. Verify the IngressRoute exists: +```bash +kubectl get ingressroute.traefik.io +``` + +**Node stuck in NotReady** +The kmod-installer may have failed. Check k3s logs: +```bash +phala ssh -- "docker logs dstack-k3s-1 2>&1 | tail -30" +``` + +## Files + +``` +k3s/ +├── docker-compose.yaml # k3s + kmod-installer + dstack-ingress +├── deploy.sh # One-command deploy + setup +├── README.md +└── manifests/ + ├── rbac.yaml # Optional: scoped service account + └── network-policy.yaml # Optional: restrict pod traffic +``` diff --git a/k3s/deploy.sh b/k3s/deploy.sh new file mode 100755 index 0000000..332acaa --- /dev/null +++ b/k3s/deploy.sh @@ -0,0 +1,189 @@ +#!/usr/bin/env bash +# Deploy k3s on dstack and set up kubectl access. +# +# Usage: +# export CLOUDFLARE_API_TOKEN=xxx +# export CERTBOT_EMAIL=you@example.com +# ./deploy.sh k3s.example.com +# +# Prerequisites: +# - phala CLI installed and authenticated (phala auth login) +# - kubectl and jq installed + +set -euo pipefail + +CLUSTER_DOMAIN="${1:-${CLUSTER_DOMAIN:-}}" +CVM_NAME="${CVM_NAME:-my-k3s}" +INSTANCE_TYPE="${INSTANCE_TYPE:-tdx.medium}" +DISK_SIZE="${DISK_SIZE:-50G}" +KUBECONFIG_FILE="${KUBECONFIG_FILE:-k3s.yaml}" + +if [[ -z "$CLUSTER_DOMAIN" ]]; then + echo "Usage: $0 " + echo " e.g. $0 k3s.example.com" + echo "" + echo "Required env vars:" + echo " CLOUDFLARE_API_TOKEN Cloudflare API token (Zone:Read + DNS:Edit)" + echo " CERTBOT_EMAIL Email for Let's Encrypt registration" + echo "" + echo "Optional env vars:" + echo " CVM_NAME CVM name (default: my-k3s)" + echo " INSTANCE_TYPE Instance type (default: tdx.medium)" + echo " DISK_SIZE Disk size (default: 50G)" + echo " KUBECONFIG_FILE Output kubeconfig path (default: k3s.yaml)" + exit 1 +fi + +: "${CLOUDFLARE_API_TOKEN:?CLOUDFLARE_API_TOKEN is required}" +: "${CERTBOT_EMAIL:?CERTBOT_EMAIL is required}" + +for cmd in phala kubectl jq; do + command -v "$cmd" >/dev/null 2>&1 || { echo "Error: $cmd is required but not found"; exit 1; } +done + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# ---------- Step 1: Deploy the CVM ---------- + +echo "==> Deploying CVM '${CVM_NAME}'..." +phala deploy \ + -n "$CVM_NAME" \ + -c "${SCRIPT_DIR}/docker-compose.yaml" \ + -t "$INSTANCE_TYPE" \ + --disk-size "$DISK_SIZE" \ + --dev-os \ + -e "CLOUDFLARE_API_TOKEN=${CLOUDFLARE_API_TOKEN}" \ + -e "CERTBOT_EMAIL=${CERTBOT_EMAIL}" \ + -e "CLUSTER_DOMAIN=${CLUSTER_DOMAIN}" \ + --wait + +# ---------- Step 2: Extract APP_ID and GATEWAY_DOMAIN ---------- + +echo "==> Fetching CVM info..." +CVM_JSON=$(phala cvms get "$CVM_NAME" --json 2>/dev/null) +APP_ID=$(echo "$CVM_JSON" | jq -r '.app_id') +GATEWAY_DOMAIN=$(echo "$CVM_JSON" | jq -r '.gateway.base_domain') + +echo " App ID: ${APP_ID}" +echo " Gateway domain: ${GATEWAY_DOMAIN}" + +# ---------- Step 3: Wait for SSH and extract kubeconfig ---------- + +echo "==> Waiting for CVM to boot (this takes 2-3 minutes)..." +for i in $(seq 1 30); do + if phala ssh "$APP_ID" -- "echo ok" >/dev/null 2>&1; then + break + fi + if [[ $i -eq 30 ]]; then + echo "Error: SSH not available after 5 minutes" + exit 1 + fi + sleep 10 +done + +echo "==> Extracting kubeconfig..." +for i in $(seq 1 12); do + if phala ssh "$APP_ID" -- \ + "docker exec dstack-k3s-1 cat /etc/rancher/k3s/k3s.yaml" \ + 2>/dev/null > "$KUBECONFIG_FILE" && [[ -s "$KUBECONFIG_FILE" ]]; then + break + fi + if [[ $i -eq 12 ]]; then + echo "Error: could not extract kubeconfig after 2 minutes" + exit 1 + fi + sleep 10 +done + +# Rewrite API server URL to use the gateway TLS passthrough endpoint +sed -i "s|https://127.0.0.1:6443|https://${APP_ID}-6443s.${GATEWAY_DOMAIN}|" "$KUBECONFIG_FILE" + +export KUBECONFIG="${KUBECONFIG_FILE}" + +# ---------- Step 4: Wait for node Ready ---------- + +echo "==> Waiting for k3s node to become Ready..." +for i in $(seq 1 30); do + STATUS=$(kubectl get nodes -o jsonpath='{.items[0].status.conditions[?(@.type=="Ready")].status}' 2>/dev/null || echo "") + if [[ "$STATUS" == "True" ]]; then + break + fi + if [[ $i -eq 30 ]]; then + echo "Error: node not Ready after 5 minutes" + exit 1 + fi + sleep 10 +done + +echo "==> Node is Ready:" +kubectl get nodes + +# ---------- Step 5: Wait for wildcard certificate ---------- + +echo "==> Waiting for wildcard TLS certificate (this takes 5-8 minutes)..." +CERT_TEST_URL="https://test.${CLUSTER_DOMAIN}/" +for i in $(seq 1 60); do + HTTP_CODE=$(curl -s -o /dev/null -w '%{http_code}' --max-time 5 "$CERT_TEST_URL" 2>/dev/null || echo "000") + if [[ "$HTTP_CODE" != "000" ]]; then + echo " Certificate is ready (got HTTP ${HTTP_CODE})" + break + fi + if [[ $i -eq 60 ]]; then + echo "Warning: certificate not ready after 10 minutes, continuing anyway" + break + fi + sleep 10 +done + +# ---------- Step 6: Deploy test workload ---------- + +echo "==> Deploying nginx test workload..." +NGINX_HOST="nginx.${CLUSTER_DOMAIN}" + +kubectl run nginx --image=nginx:alpine --port=80 --restart=Never 2>/dev/null || true +kubectl expose pod nginx --port=80 --target-port=80 --name=nginx 2>/dev/null || true + +kubectl apply -f - < Smoke test..." +HTTP_CODE=$(curl -s -o /dev/null -w '%{http_code}' --max-time 10 "https://${NGINX_HOST}/" 2>/dev/null || echo "000") +if [[ "$HTTP_CODE" == "200" ]]; then + echo " PASS: https://${NGINX_HOST}/ returned 200" +else + echo " WARN: https://${NGINX_HOST}/ returned ${HTTP_CODE} (cert may still be propagating)" +fi + +# ---------- Done ---------- + +echo "" +echo "============================================" +echo " k3s on dstack is ready!" +echo "============================================" +echo "" +echo " Kubeconfig: export KUBECONFIG=$(pwd)/${KUBECONFIG_FILE}" +echo " kubectl: kubectl get nodes" +echo " Test URL: https://${NGINX_HOST}/" +echo " Evidence: https://${NGINX_HOST}/evidences/quote" +echo "" +echo " To clean up:" +echo " kubectl delete ingressroute.traefik.io nginx && kubectl delete svc nginx && kubectl delete pod nginx" +echo " echo y | phala cvms delete ${CVM_NAME} && rm ${KUBECONFIG_FILE}" diff --git a/k3s/docker-compose.yaml b/k3s/docker-compose.yaml new file mode 100644 index 0000000..71df601 --- /dev/null +++ b/k3s/docker-compose.yaml @@ -0,0 +1,64 @@ +services: + kmod-installer: + image: kvin/dstack-extra-kmods:0.5.8 + privileged: true + pid: host + volumes: + - /:/host + restart: "no" + + k3s: + image: rancher/k3s:v1.31.6-k3s1 + command: + - server + - --node-name=${K3S_NODE_NAME:-k3s-node} + - --tls-san=0.0.0.0 + - --tls-san=${DSTACK_APP_ID}-6443s.${DSTACK_GATEWAY_DOMAIN} + - --kubelet-arg=max-pods=512 + privileged: true + ports: + - "6443:6443" # Kubernetes API (TLS passthrough via gateway) + - "80:80" # Traefik HTTP entrypoint + - "8443:443" # Traefik HTTPS entrypoint (internal) + environment: + - K3S_KUBECONFIG_MODE=644 + tmpfs: + - /run + - /var/run + volumes: + - k3s-data:/var/lib/rancher/k3s + - k3s-kubelet:/var/lib/kubelet + - k3s-etc:/etc/rancher + - k3s-log:/var/log + depends_on: + kmod-installer: + condition: service_completed_successfully + restart: unless-stopped + + dstack-ingress: + image: dstacktee/dstack-ingress:2.1@sha256:36894662bdd252d53e8492be147f43dd7d91a5732a78a2a85f39e55c1460b4d0 + ports: + - "443:443" + environment: + - DNS_PROVIDER=cloudflare + - CLOUDFLARE_API_TOKEN=${CLOUDFLARE_API_TOKEN} + - DOMAIN=*.${CLUSTER_DOMAIN} + - GATEWAY_DOMAIN=_.${DSTACK_GATEWAY_DOMAIN} + - CERTBOT_EMAIL=${CERTBOT_EMAIL} + - SET_CAA=true + - TARGET_ENDPOINT=k3s:80 + - EVIDENCE_PORT=8080 + volumes: + - /var/run/dstack.sock:/var/run/dstack.sock + - /var/run/tappd.sock:/var/run/tappd.sock + - cert-data:/etc/letsencrypt + depends_on: + - k3s + restart: unless-stopped + +volumes: + k3s-data: + k3s-kubelet: + k3s-etc: + k3s-log: + cert-data: diff --git a/k3s/manifests/network-policy.yaml b/k3s/manifests/network-policy.yaml new file mode 100644 index 0000000..4c93e5c --- /dev/null +++ b/k3s/manifests/network-policy.yaml @@ -0,0 +1,53 @@ +# Default deny all inter-pod traffic +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: default-deny + namespace: default +spec: + podSelector: {} + policyTypes: + - Ingress + - Egress +--- +# Allow egress to internet (block pod-to-pod and service CIDRs) + DNS +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-internet-egress + namespace: default +spec: + podSelector: {} + policyTypes: + - Egress + egress: + - to: + - ipBlock: + cidr: 0.0.0.0/0 + except: + - 10.42.0.0/16 # k3s pod CIDR + - 10.43.0.0/16 # k3s service CIDR + - ports: + - port: 53 + protocol: UDP + - port: 53 + protocol: TCP +--- +# Allow ingress from Traefik (kube-system namespace) only +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-traefik-ingress + namespace: default +spec: + podSelector: {} + policyTypes: + - Ingress + ingress: + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: kube-system + podSelector: + matchLabels: + app.kubernetes.io/name: traefik diff --git a/k3s/manifests/rbac.yaml b/k3s/manifests/rbac.yaml new file mode 100644 index 0000000..8bd3554 --- /dev/null +++ b/k3s/manifests/rbac.yaml @@ -0,0 +1,18 @@ +apiVersion: v1 +kind: ServiceAccount +metadata: + name: k3s-admin + namespace: default +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: k3s-admin +subjects: + - kind: ServiceAccount + name: k3s-admin + namespace: default +roleRef: + kind: ClusterRole + name: cluster-admin + apiGroup: rbac.authorization.k8s.io