diff --git a/content/patterns/coco-pattern/_index.adoc b/content/patterns/coco-pattern/_index.adoc index 6e6f4cf28..27733c667 100644 --- a/content/patterns/coco-pattern/_index.adoc +++ b/content/patterns/coco-pattern/_index.adoc @@ -27,7 +27,7 @@ include::modules/comm-attributes.adoc[] = About the Confidential Containers pattern Confidential computing is a technology for securing data in use. It uses a https://en.wikipedia.org/wiki/Trusted_execution_environment[Trusted Execution Environment] (TEE) provided within the hardware of the processor to prevent access from others who have access to the system, including cluster administrators and hypervisor operators. -https://confidentialcontainers.org/[Confidential containers] is a project to standardize the consumption of confidential computing by making the security boundary for confidential computing a Kubernetes pod. https://katacontainers.io/[Kata containers] is used to establish the boundary via a shim VM. +https://confidentialcontainers.org/[Confidential containers] is a project to standardize the consumption of confidential computing by making the security boundary for confidential computing a Kubernetes pod. https://katacontainers.io/[Kata containers] is used to establish the boundary through a shim VM. A core goal of confidential computing is to use this technology to isolate the workload from both Kubernetes and hypervisor administrators. In practice this means that even a `kubeadmin` user cannot `exec` into a running confidential container or inspect its memory. @@ -36,7 +36,7 @@ image::coco-pattern/isolation.png[Schematic describing the isolation of confiden This pattern deploys and configures https://docs.redhat.com/en/documentation/openshift_sandboxed_containers/1.12/html/deploying_confidential_containers/cc-overview[Red Hat OpenShift Sandboxed Containers] for confidential computing workloads on both cloud (Microsoft Azure) and bare metal infrastructure. -**Cloud deployments** use "peer pods" — confidential VMs provisioned directly on the Azure hypervisor rather than nested inside OpenShift worker nodes. Azure offers https://learn.microsoft.com/en-us/azure/confidential-computing/virtual-machine-options[multiple confidential VM families]; this pattern defaults to the `Standard_DCas_v5` family but can be configured to use other families via `values-global.yaml`. +**Cloud deployments** use "peer pods", which are confidential VMs provisioned directly on the Azure hypervisor rather than nested inside OpenShift worker nodes. Azure offers https://learn.microsoft.com/en-us/azure/confidential-computing/virtual-machine-options[multiple confidential VM families]; this pattern defaults to the `Standard_DCas_v5` family but can be configured to use other families by modifying `values-global.yaml`. **Bare metal deployments** support Intel TDX (Trusted Domain Extensions) and AMD SEV-SNP (Secure Encrypted Virtualization - Secure Nested Paging) hardware TEEs, with optional **Technology Preview** NVIDIA confidential GPU support (H100, H200, B100, B200) for protected GPU workloads. @@ -46,10 +46,10 @@ The pattern includes sample applications demonstrating security boundaries and s The pattern supports four deployment topologies, selected by setting `main.clusterGroupName` in `values-global.yaml`: -- **`simple`** — Single-cluster Azure deployment with all components (Trustee, Vault, ACM, sandboxed containers, workloads) on one cluster -- **`trusted-hub` + `spoke`** — Multi-cluster Azure deployment separating the trusted zone (hub with Trustee/Vault/ACM) from the untrusted workload zone (spoke) -- **`baremetal`** — Single-cluster bare metal with Intel TDX or AMD SEV-SNP support -- **`baremetal-gpu`** — **Technology Preview:** Bare metal with Intel TDX or AMD SEV-SNP and NVIDIA confidential GPU support (H100, H200, B100, B200) +- **`simple`**: Single-cluster Azure deployment with all components (Trustee, Vault, ACM, sandboxed containers, workloads) on one cluster +- **`trusted-hub` + `spoke`**: Multi-cluster Azure deployment separating the trusted zone (hub with Trustee/Vault/ACM) from the untrusted workload zone (spoke) +- **`baremetal`**: Single-cluster bare metal with Intel TDX or AMD SEV-SNP support +- **`baremetal-gpu`**: **Technology Preview:** Bare metal with Intel TDX or AMD SEV-SNP and NVIDIA confidential GPU support (H100, H200, B100, B200) == Requirements @@ -76,7 +76,7 @@ The pattern supports four deployment topologies, selected by setting `main.clust **This pattern is a demonstration only and contains configurations that are not best practice** -- The pattern supports both single-cluster (`simple` clusterGroup) and multi-cluster (`trusted-hub` + `spoke`) topologies. The default is single-cluster, which breaks the RACI separation expected in a remote attestation architecture. In the single-cluster topology, the Key Broker Service and the workloads it protects run on the same cluster, meaning a compromised cluster could affect both. The multi-cluster topology addresses this by separating the trusted zone (Trustee, Vault, ACM on the hub) from the untrusted workload zone (spoke). The https://www.ietf.org/archive/id/draft-ietf-rats-architecture-22.html[RATS] architecture mandates that the Key Broker Service (e.g. https://github.com/confidential-containers/trustee[Trustee]) is in a trusted security zone. +- The pattern supports both single-cluster (`simple` clusterGroup) and multi-cluster (`trusted-hub` + `spoke`) topologies. The default is single-cluster, which breaks the RACI separation expected in a remote attestation architecture. In the single-cluster topology, the Key Broker Service and the workloads it protects run on the same cluster, meaning a compromised cluster could affect both. The multi-cluster topology addresses this by separating the trusted zone (Trustee, Vault, ACM on the hub) from the untrusted workload zone (spoke). The https://www.ietf.org/archive/id/draft-ietf-rats-architecture-22.html[RATS] architecture mandates that the Key Broker Service (for example, https://github.com/confidential-containers/trustee[Trustee]) is in a trusted security zone. - The https://github.com/confidential-containers/trustee/tree/main/attestation-service[Attestation Service] ships with permissive default policies that accept all container images without verification. This allows quick testing but is unsuitable for production. The threat model assumes that without image signature verification, an attacker with access to the container registry could substitute malicious images that would still receive secrets from the KBS. @@ -97,10 +97,13 @@ kbs: + [source,bash] ---- -# Generate cosign key pair cosign generate-key-pair - -# Add the public key content to values-secret-coco-pattern.yaml +---- ++ +Then add the public key content to `values-secret-coco-pattern.yaml`: ++ +[source,yaml] +---- kbs: cosignPublicKeys: - | @@ -122,7 +125,7 @@ cosign verify --key cosign.pub your-registry.io/your-image:tag 4. **Configure reference values for PCR measurements**: For hardware-backed attestation, configure expected PCR values in the policy. These are automatically retrieved by `scripts/get-pcr.sh` but should be reviewed and locked down in production. See link:./coco-pattern-getting-started/#_updating_pcr_measurements[Updating PCR measurements] for the workflow when peer-pod images change. -Without these hardening steps, the attestation service will approve any workload requesting secrets, defeating the confidentiality guarantees of the TEE. +Without these hardening steps, the attestation service approves any workload requesting secrets, defeating the confidentiality guarantees of the TEE. == Future work @@ -137,21 +140,21 @@ Confidential Containers architecture separates two security zones: - **Trusted zone**: Runs the Key Broker Service (Trustee), attestation service, and secrets management (Vault). This zone verifies TEE evidence and releases secrets only to authenticated confidential workloads. - **Untrusted zone**: Runs the sandboxed containers operator, confidential workload pods, and the Kyverno policy engine. Workloads in this zone must attest to Trustee before receiving secrets. -The pattern supports both single-cluster and multi-cluster topologies. In single-cluster topologies (`simple`, `baremetal`, `baremetal-gpu`), all components run on one cluster. In the multi-cluster topology, the `trusted-hub` clusterGroup runs on the hub cluster and the `spoke` clusterGroup runs on managed clusters imported via ACM. +The pattern supports both single-cluster and multi-cluster topologies. In single-cluster topologies (`simple`, `baremetal`, `baremetal-gpu`), all components run on one cluster. In the multi-cluster topology, the `trusted-hub` clusterGroup runs on the hub cluster and the `spoke` clusterGroup runs on managed clusters imported through ACM. -**Kyverno's role**: The pattern uses Kyverno to dynamically inject attestation agent configuration (`cc_init_data`) into confidential pods at admission time. An imperative job generates ConfigMaps containing the KBS TLS certificate and policy files. Kyverno propagates these ConfigMaps to workload namespaces and injects them as pod annotations, ensuring pods have the correct configuration for attestation without manual annotation management. +**The role of Kyverno**: The pattern uses Kyverno to dynamically inject attestation agent configuration (`cc_init_data`) into confidential pods at admission time. An imperative job generates ConfigMaps containing the KBS TLS certificate and policy files. Kyverno propagates these ConfigMaps to workload namespaces and injects them as pod annotations, ensuring pods have the correct configuration for attestation without manual annotation management. image::coco-pattern/overview-schematic.png[Schematic describing the high level architecture of confidential containers] === Key components - **Red Hat Build of Trustee 1.1**: The Key Broker Service (KBS) and attestation service. Trustee verifies that workloads are running in a genuine TEE before releasing secrets. Certificates for Trustee are managed by cert-manager using self-signed CAs. -- **HashiCorp Vault**: Secrets backend for the Validated Patterns framework. Stores KBS keys, attestation policies, and PCR measurements. +- **{hashicorp-vault}**: Secrets backend for the {solution-name-upstream} framework. Stores KBS keys, attestation policies, and PCR measurements. - **OpenShift Sandboxed Containers 1.12**: Deploys and manages confidential container infrastructure. On Azure, provisions peer-pod VMs; on bare metal, configures Kata runtimes for TDX/SEV-SNP. Operator subscriptions are pinned to specific CSV versions with manual install plan approval to ensure version consistency. - **Kyverno**: Policy engine that dynamically injects `cc_init_data` annotations into confidential pods. Manages the distribution of attestation agent configuration (KBS TLS certificates, policy files) from centralized ConfigMaps to workload namespaces. -- **Red Hat Advanced Cluster Management (ACM)**: Manages the spoke cluster in multi-cluster deployments. Policies and applications are deployed to the spoke via ACM's application lifecycle management. +- **{rh-rhacm-first}**: Manages the spoke cluster in multi-cluster deployments. Policies and applications are deployed to the spoke through ACM application lifecycle management. - **Node Feature Discovery (NFD)** _(bare metal only)_: Detects Intel TDX and AMD SEV-SNP hardware capabilities and labels nodes accordingly for runtime class scheduling. -- **Intel DCAP** _(bare metal with Intel TDX)_: Provisioning Certificate Caching Service (PCCS) and Quote Generation Service (QGS) for Intel TDX remote attestation via the Intel PCS API. +- **Intel DCAP** _(bare metal with Intel TDX)_: Provisioning Certificate Caching Service (PCCS) and Quote Generation Service (QGS) for Intel TDX remote attestation through the Intel PCS API. - **NVIDIA GPU Operator** _(GPU topology only, Technology Preview)_: Manages NVIDIA confidential GPUs (H100, H200, B100, B200) with CC Manager, VFIO passthrough, and Kata device plugins for GPU-enabled confidential workloads. @@ -162,7 +165,7 @@ Intel Trusted Domain Extensions (TDX) is a hardware-based TEE technology that is **Key features:** - **Automatic hardware detection**: Node Feature Discovery (NFD) detects TDX-capable CPUs and labels nodes with `intel.feature.node.kubernetes.io/tdx=true` -- **Remote attestation**: Intel DCAP components (PCCS and QGS) enable quote generation and verification via the Intel PCS API +- **Remote attestation**: Intel DCAP components (PCCS and QGS) enable quote generation and verification through the Intel PCS API - **Transparent runtime selection**: The `kata-cc` RuntimeClass automatically uses the TDX handler (`kata-tdx`) on labeled nodes - **MachineConfig automation**: Kernel parameters (`kvm_intel.tdx=1`) and vsock modules are applied automatically @@ -172,16 +175,16 @@ Intel Trusted Domain Extensions (TDX) is a hardware-based TEE technology that is - BIOS/firmware with TDX enabled - Intel PCS API key (obtainable from https://api.portal.trustedservices.intel.com[Intel Trusted Services]) -The pattern's Intel DCAP chart deploys PCCS as a centralized caching service and QGS as a DaemonSet on TDX nodes. Quote generation happens within the TEE, with PCCS providing attestation collateral to Trustee for verification. +The pattern Intel DCAP chart deploys PCCS as a centralized caching service and QGS as a DaemonSet on TDX nodes. Quote generation happens within the TEE, with PCCS providing attestation collateral to Trustee for verification. == AMD SEV-SNP support -AMD Secure Encrypted Virtualization - Secure Nested Paging (SEV-SNP) is a hardware-based TEE technology that provides VM isolation through memory encryption and integrity protection. SEV-SNP extends AMD's SEV technology with secure nested paging to protect against additional attack vectors. The pattern provides full AMD SEV-SNP support on bare metal deployments. +AMD Secure Encrypted Virtualization - Secure Nested Paging (SEV-SNP) is a hardware-based TEE technology that provides VM isolation through memory encryption and integrity protection. SEV-SNP extends AMD SEV technology with secure nested paging to protect against additional attack vectors. The pattern provides full AMD SEV-SNP support on bare metal deployments. **Key features:** - **Automatic hardware detection**: Node Feature Discovery (NFD) detects SEV-SNP-capable processors and labels nodes with `amd.feature.node.kubernetes.io/snp=true` -- **Certificate chain-based attestation**: AMD SEV-SNP uses a certificate chain model for attestation verification, eliminating the need for a collateral caching service like Intel's PCCS +- **Certificate chain-based attestation**: AMD SEV-SNP uses a certificate chain model for attestation verification, eliminating the need for a collateral caching service like Intel PCCS - **Transparent runtime selection**: The `kata-cc` RuntimeClass automatically uses the SEV-SNP handler (`kata-snp`) on labeled nodes - **MachineConfig automation**: Kernel parameters for SEV-SNP enablement and vsock modules are applied automatically @@ -191,17 +194,17 @@ AMD Secure Encrypted Virtualization - Secure Nested Paging (SEV-SNP) is a hardwa - BIOS/firmware with SEV-SNP enabled - No external attestation service required (certificate chain-based model) -AMD SEV-SNP's certificate chain approach simplifies the attestation infrastructure compared to Intel TDX, as the full certificate chain is embedded in the attestation evidence sent to Trustee for verification. +The AMD SEV-SNP certificate chain approach simplifies the attestation infrastructure compared to Intel TDX, as the full certificate chain is embedded in the attestation evidence sent to Trustee for verification. == NVIDIA confidential GPU support (**Technology Preview**) -NVIDIA confidential GPUs with confidential computing firmware enable GPU-accelerated workloads to run inside TEEs with hardware-enforced memory encryption and attestation. The pattern's `baremetal-gpu` topology provides support for NVIDIA confidential GPUs (H100, H200, B100, B200) on bare metal with either Intel TDX or AMD SEV-SNP as the host TEE platform. +NVIDIA confidential GPUs with confidential computing firmware enable GPU-accelerated workloads to run inside TEEs with hardware-enforced memory encryption and attestation. The pattern `baremetal-gpu` topology provides support for NVIDIA confidential GPUs (H100, H200, B100, B200) on bare metal with either Intel TDX or AMD SEV-SNP as the host TEE platform. **Key features:** -- **GPU passthrough via VFIO**: GPUs are passed through to Kata confidential VMs using IOMMU and VFIO, providing native GPU performance +- **GPU passthrough through VFIO**: GPUs are passed through to Kata confidential VMs by using IOMMU and VFIO, providing native GPU performance - **Confidential Computing Manager**: NVIDIA CC Manager enforces confidential mode at the GPU firmware level -- **GPU attestation**: The GPU's attestation evidence is included in the TEE's attestation report to Trustee +- **GPU attestation**: The GPU attestation evidence is included in the TEE attestation report to Trustee - **Kata device plugin**: The NVIDIA Kata sandbox device plugin exposes GPUs as schedulable resources (`nvidia.com/pgpu`) - **Multi-platform support**: Works with both Intel TDX and AMD SEV-SNP host TEE platforms @@ -209,7 +212,7 @@ NVIDIA confidential GPUs with confidential computing firmware enable GPU-acceler - NVIDIA GPUs with confidential computing firmware (H100, H200, B100, B200) - Intel TDX or AMD SEV-SNP enabled bare metal host -- IOMMU-capable system (kernel parameters applied via MachineConfig: `intel_iommu=on` or `amd_iommu=on`) +- IOMMU-capable system (kernel parameters applied by using MachineConfig: `intel_iommu=on` or `amd_iommu=on`) - NVIDIA GPU Operator v26.3.0+ The pattern includes a sample CUDA workload (`gpu-vectoradd`) that demonstrates GPU-accelerated computation within a confidential container, verifying both GPU functionality and attestation integration. Testing has been performed with Intel TDX + H100; AMD SEV-SNP + GPU configurations are expected to work but have not been fully validated. @@ -249,5 +252,5 @@ The pattern includes a sample CUDA workload (`gpu-vectoradd`) that demonstrates **Related patterns:** -- link:../multicloud-gitops-sgx/[Intel SGX protected Vault for Multicloud GitOps] — Uses Intel SGX enclaves (Gramine) for application-level confidential computing, complementary to CoCo's VM-based TEE approach -- link:../layered-zero-trust/[Layered Zero Trust] — Demonstrates workload identity (SPIFFE/SPIRE), secrets management (Vault/ESO), and zero-trust principles that complement CoCo's TEE isolation +- link:../multicloud-gitops-sgx/[Intel SGX protected Vault for Multicloud GitOps]: Uses Intel SGX enclaves (Gramine) for application-level confidential computing, complementary to the CoCo VM-based TEE approach +- link:../layered-zero-trust/[Layered Zero Trust]: Demonstrates workload identity (SPIFFE/SPIRE), secrets management (Vault/ESO), and zero-trust principles that complement CoCo TEE isolation diff --git a/content/patterns/coco-pattern/coco-pattern-azure-requirements.adoc b/content/patterns/coco-pattern/coco-pattern-azure-requirements.adoc index 964ee3e08..888c27fa2 100644 --- a/content/patterns/coco-pattern/coco-pattern-azure-requirements.adoc +++ b/content/patterns/coco-pattern/coco-pattern-azure-requirements.adoc @@ -8,7 +8,7 @@ aliases: /coco-pattern/coco-pattern-azure-requirements/ :_content-type: ASSEMBLY include::modules/comm-attributes.adoc[] -:imagesdir: ../../../images +:imagesdir: /images = Azure requirements This pattern has been tested on Microsoft Azure using self-managed OpenShift 4.19.28+ clusters provisioned with `openshift-install`. @@ -37,7 +37,7 @@ This means that access is required to Azure https://learn.microsoft.com/en-us/az These confidential VMs are *NOT* available in all regions. Check https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/[Azure products by region] to confirm availability of your chosen VM family in your target region. -Users will also need to request quota increases for their chosen confidential VM family (e.g., `Standard_DC2as_v5`, `Standard_DC4as_v5`, `Standard_DC8as_v5`, `Standard_DC16as_v5` for the DCas_v5 family) in their target region. By default, Azure subscriptions may have zero quota for confidential computing VM sizes. +Users also need to request quota increases for their chosen confidential VM family (for example, `Standard_DC2as_v5`, `Standard_DC4as_v5`, `Standard_DC8as_v5`, `Standard_DC16as_v5` for the DCas_v5 family) in their target region. By default, Azure subscriptions may have zero quota for confidential computing VM sizes. image::coco-pattern/peer_pods.png[Schematic diagram of peer pods vs standard kata containers] @@ -70,4 +70,4 @@ global: clusterRegion: '' ---- -The `clusterResGroup`, `clusterSubnet`, and `clusterNSG` values can be found in the Azure portal after the cluster has been provisioned, or via `openshift-install` metadata. The `DNSResGroup` and `hostedZoneName` correspond to the Azure DNS zone used for the cluster's base domain. +The `clusterResGroup`, `clusterSubnet`, and `clusterNSG` values can be found in the Azure portal after the cluster has been provisioned, or by using `openshift-install` metadata. The `DNSResGroup` and `hostedZoneName` correspond to the Azure DNS zone used for the cluster's base domain. diff --git a/content/patterns/coco-pattern/coco-pattern-getting-started.adoc b/content/patterns/coco-pattern/coco-pattern-getting-started.adoc index 4ac6f3ddd..0e04c90d9 100644 --- a/content/patterns/coco-pattern/coco-pattern-getting-started.adoc +++ b/content/patterns/coco-pattern/coco-pattern-getting-started.adoc @@ -1,11 +1,11 @@ --- title: Getting started -weight: 10 +weight: 20 aliases: /coco-pattern/coco-pattern-getting-started/ --- :toc: - +:imagesdir: /images :_content-type: ASSEMBLY include::modules/comm-attributes.adoc[] @@ -22,11 +22,11 @@ include::modules/comm-attributes.adoc[] 3. Fork the repository and clone it locally. ArgoCD reconciles against your fork, so all configuration changes must be committed and pushed. -4. Run `bash scripts/gen-secrets.sh` to generate KBS key pairs, attestation policy seeds, and copy the values-secret template to `~/values-secret-coco-pattern.yaml`. This script will not overwrite existing secrets. +4. Run `bash scripts/gen-secrets.sh` to generate KBS key pairs, attestation policy seeds, and copy the values-secret template to `~/values-secret-coco-pattern.yaml`. This script does not overwrite existing secrets. 5. Run `bash scripts/get-pcr.sh` to retrieve PCR measurements from the peer-pod VM image. This stores the measurements at `~/.coco-pattern/measurements.json`, which are loaded into Vault and used by the attestation service. Requires `podman`, `skopeo`, and a pull secret at `~/pull-secret.json`. -6. Review and customise `~/values-secret-coco-pattern.yaml`. This file controls what secrets are loaded into Vault, including attestation policies, KBS key material, and PCR measurements. See the comments in `values-secret.yaml.template` for details on each field. +6. Review and customize `~/values-secret-coco-pattern.yaml`. This file controls what secrets are loaded into Vault, including attestation policies, KBS key material, and PCR measurements. See the comments in `values-secret.yaml.template` for details on each field. ==== Bare metal deployments @@ -42,17 +42,17 @@ include::modules/comm-attributes.adoc[] 6. For bare metal, PCR measurements must be collected manually after the first boot. See the link:../coco-pattern-tested-environments/[tested environments] page for guidance on PCR collection for bare metal. Store the measurements at `~/.coco-pattern/measurements.json`. -7. Review and customise `~/values-secret-coco-pattern.yaml`. For Intel TDX, uncomment the PCCS secrets section and provide your Intel PCS API key. See the comments in `values-secret.yaml.template` for details on each field. +7. Review and customize `~/values-secret-coco-pattern.yaml`. For Intel TDX, uncomment the PCCS secrets section and provide your Intel PCS API key. See the comments in `values-secret.yaml.template` for details on each field. === Single cluster deployment -The single-cluster topology uses the `simple` clusterGroup. All components — Trustee, Vault, ACM, sandboxed containers, and workloads — are deployed on one cluster. +The single-cluster topology uses the `simple` clusterGroup. All components, including Trustee, Vault, ACM, sandboxed containers, and workloads, are deployed on one cluster. 1. Ensure `main.clusterGroupName: simple` is set in `values-global.yaml` 2. `./pattern.sh make install` -3. Wait for the cluster to reboot all nodes. The sandboxed containers operator applies a MachineConfig update that triggers a rolling reboot. Monitor progress via the ArgoCD UI or `oc get nodes`. +3. Wait for the cluster to reboot all nodes. The sandboxed containers operator applies a MachineConfig update that triggers a rolling reboot. Monitor progress in the ArgoCD UI or by running `oc get nodes`. 4. If the services do not come up, use the ArgoCD UI to triage potential timeouts. Peer-pod VMs may need to be restarted if they time out during initial provisioning. @@ -68,9 +68,9 @@ The multi-cluster topology separates the trusted zone (hub) from the untrusted w 4. Provision a second OpenShift 4.17+ cluster on Azure for the spoke -5. Import the spoke cluster into ACM with the label `clusterGroup=spoke` (see https://validatedpatterns.io/learn/importing-a-cluster/[importing a cluster]). ACM will automatically deploy the `spoke` clusterGroup applications to the imported cluster. +5. Import the spoke cluster into ACM with the label `clusterGroup=spoke` (see https://validatedpatterns.io/learn/importing-a-cluster/[importing a cluster]). ACM automatically deploys the `spoke` clusterGroup applications to the imported cluster. -6. The spoke cluster will install the sandboxed containers operator, deploy peer-pod infrastructure, and launch the sample workloads. Monitor progress in the ACM console or via ArgoCD on the spoke. +6. The spoke cluster installs the sandboxed containers operator, deploys peer-pod infrastructure, and launches the sample workloads. Monitor progress in the ACM console or in ArgoCD on the spoke. === Bare metal deployment @@ -87,7 +87,7 @@ The bare metal topology uses the `baremetal` clusterGroup for Intel TDX or AMD S 5. `./pattern.sh make install` 6. Wait for the cluster to reboot nodes. The pattern applies MachineConfig updates for: - - TDX/SEV-SNP kernel parameters (e.g., `kvm_intel.tdx=1` for Intel TDX, `kvm_amd.sev=1` for AMD SEV-SNP) + - TDX/SEV-SNP kernel parameters (for example, `kvm_intel.tdx=1` for Intel TDX, `kvm_amd.sev=1` for AMD SEV-SNP) - `nohibernate` kernel argument - vsock-loopback kernel module configuration + @@ -109,8 +109,8 @@ The pattern automatically detects and configures your TEE hardware: - AMD SEV-SNP: `amd.feature.node.kubernetes.io/snp=true` - **HostPath Provisioner (HPP)** provides persistent storage for bare metal deployments - **RuntimeClass**: The `kata-cc` RuntimeClass is created automatically, using `kata-tdx` or `kata-snp` handler based on detected hardware -- Both `kata-tdx` and `kata-snp` RuntimeClasses are deployed; only the one matching your hardware will have schedulable nodes -- **Intel DCAP components** (PCCS and QGS) deploy unconditionally but DaemonSets only schedule on Intel TDX nodes via NFD label selectors +- Both `kata-tdx` and `kata-snp` RuntimeClasses are deployed; only the one matching your hardware has schedulable nodes +- **Intel DCAP components** (PCCS and QGS) deploy unconditionally but DaemonSets only schedule on Intel TDX nodes by using NFD label selectors **Optional PCCS node pinning:** For Intel TDX deployments, you can pin the PCCS service to a specific node by running `bash scripts/get-pccs-node.sh` and setting `baremetal.pccs.nodeSelector` in the baremetal chart values. @@ -130,7 +130,7 @@ The `baremetal-gpu` topology extends the bare metal deployment with NVIDIA confi - Intel: `intel_iommu=on` - AMD: `amd_iommu=on` + - All nodes will reboot to apply these kernel parameters. + All nodes reboot to apply these kernel parameters. + **Note**: MCO-driven reboots may cause Vault secret loading to time out. If needed, re-run `./pattern.sh make upgrade` after nodes finish rebooting. @@ -151,7 +151,7 @@ The `baremetal-gpu` topology extends the bare metal deployment with NVIDIA confi [NOTE] ==== -The `baremetal-gpu` topology applies IOMMU MachineConfig to all nodes and triggers reboots even on clusters without GPUs. If you do not have GPUs, use the `baremetal` topology instead. The GPU workload (`gpu-vectoradd`) will remain in `Pending` state on systems without GPUs but is otherwise harmless. +The `baremetal-gpu` topology applies IOMMU MachineConfig to all nodes and triggers reboots even on clusters without GPUs. If you do not have GPUs, use the `baremetal` topology instead. The GPU workload (`gpu-vectoradd`) remains in `Pending` state on systems without GPUs but is otherwise harmless. ==== **GPU-specific components:** @@ -171,7 +171,7 @@ The pattern deploys a `gpu-vectoradd` sample workload that runs a CUDA vector ad oc logs -n gpu-workload deployment/gpu-vectoradd ---- -Expected output should show successful CUDA execution and GPU device detection. +The expected output shows successful CUDA execution and GPU device detection. == Updating PCR measurements @@ -197,7 +197,7 @@ bash scripts/get-pcr.sh + This fetches the current peer-pod image from your cluster's registry, extracts the measurements, and stores them at `~/.coco-pattern/measurements.json`. -2. **Update Vault secrets**: The measurements are loaded into Vault via `values-secret-coco-pattern.yaml`. If you previously deployed the pattern, update the `pcrMeasurements` field in your values-secret file with the new content from `~/.coco-pattern/measurements.json`. +2. **Update Vault secrets**: The measurements are loaded into Vault by using `values-secret-coco-pattern.yaml`. If you previously deployed the pattern, update the `pcrMeasurements` field in your values-secret file with the new content from `~/.coco-pattern/measurements.json`. 3. **Sync to the cluster**: Push the updated values-secret file to refresh Vault: + @@ -261,7 +261,7 @@ If the PCR values still do not match after following these steps, the peer-pod V == Simple Confidential container tests The pattern deploys some simple tests of CoCo with this pattern. -A "Hello Openshift" (e.g. `curl` to return "Hello Openshift!") application has been deployed in three configurations: +A "Hello Openshift" (for example, `curl` to return "Hello Openshift!") application has been deployed in three configurations: 1. A vanilla kubernetes pod: `oc get pods -n hello-openshift standard` 2. A confidential container with a strict policy: `oc get pods -n hello-openshift secure` @@ -270,13 +270,13 @@ A "Hello Openshift" (e.g. `curl` to return "Hello Openshift!") application has b In this case the insecure policy is designed to allow a user to be able to exec into the confidential container. Typically this is disabled by an immutable policy established at pod creation time. -Doing `oc get pod -n hello-openshift secure -o yaml` for either of the pods running a confidential container should show: +Running `oc get pod -n hello-openshift secure -o yaml` for either of the pods running a confidential container shows: - **Azure deployments**: `runtimeClassName: kata-remote` (peer-pod provisioned on Azure hypervisor) - **Bare metal deployments**: `runtimeClassName: kata-cc` (Kata container running on TDX/SEV-SNP hardware) - **Bare metal GPU deployments**: `runtimeClassName: kata-cc-nvidia-gpu` (GPU-enabled Kata container) -**Azure-specific verification:** Logging into the Azure portal once the pods have been provisioned will show that each confidential pod has its own `Standard_DC2as_v5` virtual machine. These VMs are visible under the cluster's resource group. +**Azure-specific verification:** Logging into the Azure portal after the pods have been provisioned shows that each confidential pod has its own `Standard_DC2as_v5` virtual machine. These VMs are visible under the cluster's resource group. === `oc exec` testing @@ -286,37 +286,37 @@ However: 1. Cluster admins can always circumvent this capability 2. Anyone logged into the node directly can also circumvent this capability -Confidential containers enforce this boundary at the hardware level, independent of RBAC. Running: `oc exec -n hello-openshift -it secure -- bash` will result in a denial of access, irrespective of the user undertaking the action, including `kubeadmin`. The policy is baked into the pod at creation time and cannot be modified at runtime. +Confidential containers enforce this boundary at the hardware level, independent of RBAC. Running `oc exec -n hello-openshift -it secure -- bash` results in a denial of access, irrespective of the user undertaking the action, including `kubeadmin`. The policy is baked into the pod at creation time and cannot be modified at runtime. -For comparison, `oc exec -n hello-openshift -it standard -- bash` (the standard pod) and `oc exec -n hello-openshift -it insecure-policy -- bash` (the CoCo pod with a relaxed policy) will both allow shell access. +For comparison, `oc exec -n hello-openshift -it standard -- bash` (the standard pod) and `oc exec -n hello-openshift -it insecure-policy -- bash` (the CoCo pod with a relaxed policy) both allow shell access. === Confidential Data Hub testing -Part of the CoCo VM is a component called the Confidential Data Hub (CDH), which simplifies access to the Trustee Key Broker Service (KBS) for end applications. The CDH runs inside the confidential VM and handles attestation transparently — applications simply make HTTP requests to a localhost endpoint. +Part of the CoCo VM is a component called the Confidential Data Hub (CDH), which simplifies access to the Trustee Key Broker Service (KBS) for end applications. The CDH runs inside the confidential VM and handles attestation transparently. Applications simply make HTTP requests to a localhost endpoint. Find out more about how the CDH and Trustee work together https://www.redhat.com/en/blog/introducing-confidential-containers-trustee-attestation-services-solution-overview-and-use-cases[here]. image::coco-pattern/trustee.png[] -The CDH presents to containers within the pod (only), via a localhost URL. The CoCo container with an insecure policy can be used for testing the behaviour, since it allows `oc exec`. +The CDH presents to containers within the pod (only), through a localhost URL. The CoCo container with an insecure policy can be used for testing the behaviour, because it allows `oc exec`. - `oc exec -n hello-openshift -it insecure-policy -- bash` to get a shell into a confidential container -- https://github.com/validatedpatterns/trustee-chart/[Trustee's configuration] specifies the list of secrets which the KBS can access with the `kbsSecretResources` attribute. These are mapped to Vault paths (e.g. `secret/data/hub/kbsres1`). +- The https://github.com/validatedpatterns/trustee-chart/[Trustee configuration] specifies the list of secrets that the KBS can access with the `kbsSecretResources` attribute. These are mapped to Vault paths (for example, `secret/data/hub/kbsres1`). - Secrets within the CDH can be accessed (by default) at `http://127.0.0.1:8006/cdh/resource/default/$K8S_SECRET/$K8S_SECRET_KEY`. -- In this case `http://127.0.0.1:8006/cdh/resource/default/passphrase/passphrase` by default will return a string which was randomly generated when the pattern was deployed. +- In this case `http://127.0.0.1:8006/cdh/resource/default/passphrase/passphrase` by default returns a string that was randomly generated when the pattern was deployed. -- To verify, compare the CDH output against the Vault-backed secret: `oc get secrets -n trustee-operator-system passphrase -o yaml | yq '.data.passphrase' | base64 -d`. The values should match. +- To verify, compare the CDH output against the Vault-backed secret: `oc get secrets -n trustee-operator-system passphrase -o yaml | yq '.data.passphrase' | base64 -d`. The values match. -- Tailing the logs for the KBS container (e.g. `oc logs -n trustee-operator-system -l app=kbs -f`) shows the attestation evidence flowing from the CDH to the KBS, including TEE evidence validation. +- Tailing the logs for the KBS container (for example, `oc logs -n trustee-operator-system -l app=kbs -f`) shows the attestation evidence flowing from the CDH to the KBS, including TEE evidence validation. === kbs-access application -The `kbs-access` application is a web service deployed in the `kbs-access` namespace. It retrieves secrets from Trustee via the CDH and presents them through a web interface. This provides a convenient way to verify that the full attestation pipeline is working end-to-end without needing to exec into a pod. +The `kbs-access` application is a web service deployed in the `kbs-access` namespace. It retrieves secrets from Trustee through the CDH and presents them in a web interface. This provides a convenient way to verify that the full attestation pipeline is working end-to-end without needing to exec into a pod. -Access the application via its OpenShift route: `oc get route -n kbs-access`. +Access the application by using its OpenShift route: `oc get route -n kbs-access`. diff --git a/content/patterns/coco-pattern/coco-pattern-tested-environments.adoc b/content/patterns/coco-pattern/coco-pattern-tested-environments.adoc index fc2242e90..bbcc93e99 100644 --- a/content/patterns/coco-pattern/coco-pattern-tested-environments.adoc +++ b/content/patterns/coco-pattern/coco-pattern-tested-environments.adoc @@ -1,14 +1,15 @@ --- title: CoCo pattern tested environments -weight: 10 +weight: 30 aliases: /coco-pattern/coco-pattern-tested-environments/ --- +:toc: :_content-type: ASSEMBLY include::modules/comm-attributes.adoc[] -:imagesdir: ../../../images +:imagesdir: /images = Tested environments @@ -19,17 +20,17 @@ Version 5 introduces Kyverno-based `cc_init_data` injection, bare metal support === Supported components - OpenShift Sandboxed Containers Operator 1.12 - Red Hat Build of Trustee 1.1 -- OpenShift Container Platform 4.19.28+ +- {ocp} 4.19.28+ - Kyverno 3.7.* - cert-manager operator (stable-v1 channel) -- Red Hat Advanced Cluster Management (for multi-cluster topology) -- HashiCorp Vault (secrets management) +- {rh-rhacm-first} (for multi-cluster topology) +- {hashicorp-vault} (secrets management) - Node Feature Discovery Operator (for bare metal) - Intel Device Plugins Operator (for Intel TDX) - NVIDIA GPU Operator v26.3.0+ (for GPU topology) === Azure single cluster -Tested on Azure with the `simple` clusterGroup using self-managed OpenShift 4.19.28+ provisioned via `openshift-install`. In this topology all components — Trustee, Vault, ACM, sandboxed containers operator, Kyverno, and sample workloads — are deployed on a single cluster. +Tested on Azure with the `simple` clusterGroup using self-managed OpenShift 4.19.28+ provisioned by using `openshift-install`. In this topology all components, including Trustee, Vault, ACM, sandboxed containers operator, Kyverno, and sample workloads, are deployed on a single cluster. Worker nodes use `Standard_D8s_v5` or larger. Peer-pod VMs for confidential containers default to `Standard_DC2as_v5` from the Azure confidential computing VM family, but other Azure https://learn.microsoft.com/en-us/azure/confidential-computing/virtual-machine-options[confidential VM families] can be configured in `values-global.yaml`. Azure DNS is required for the cluster's hosted zone. @@ -41,7 +42,7 @@ Tested with `trusted-hub` + `spoke` clusterGroups on Azure, both using self-mana - `trusted-hub`: Vault, ACM, Trustee (KBS + attestation service), cert-manager, Kyverno. This cluster acts as the trust anchor and ACM hub. - `spoke`: Sandboxed containers operator, Kyverno, peer-pod infrastructure, and sample workloads (hello-openshift, kbs-access). Imported into ACM with the `clusterGroup=spoke` label. -The spoke cluster connects back to the hub's Trustee instance for attestation and secret retrieval. Secrets are synchronised from the hub's Vault to the spoke via the External Secrets operator. +The spoke cluster connects back to the hub Trustee instance for attestation and secret retrieval. Secrets are synchronized from the hub Vault to the spoke through the External Secrets operator. === Bare metal single cluster (Intel TDX) Tested on Single Node OpenShift (SNO) with Intel TDX hardware using the `baremetal` clusterGroup. All components run on a single node. diff --git a/content/patterns/coco-pattern/coco-pattern-troubleshooting.adoc b/content/patterns/coco-pattern/coco-pattern-troubleshooting.adoc index c9db467c0..ec6afaa0c 100644 --- a/content/patterns/coco-pattern/coco-pattern-troubleshooting.adoc +++ b/content/patterns/coco-pattern/coco-pattern-troubleshooting.adoc @@ -37,7 +37,7 @@ Verify the KataConfig is ready: oc get kataconfig -n openshift-sandboxed-containers-operator ---- + -The status should show `InProgress: False` and the RuntimeClasses should be created (`kata-remote` for Azure, `kata-cc` for bare metal). +Verify the status shows `InProgress: False` and the RuntimeClasses are created (`kata-remote` for Azure, `kata-cc` for bare metal). + If the KataConfig is stuck, check the operator logs: + @@ -118,7 +118,7 @@ Problem:: `oc exec` denied unexpectedly into a confidential container Solution:: This is expected behavior for containers with strict policies. Verify which policy the pod is using. + -Check the pod's initdata annotation: +Check the pod initdata annotation: + [source,terminal] ---- @@ -148,7 +148,7 @@ If the ConfigMap is missing, check if Kyverno propagated it: oc get configmap -n imperative -l coco.io/type=initdata ---- + -The source ConfigMap should exist in the `imperative` namespace. If missing, the `init-data-gzipper` job may have failed: +The source ConfigMap must exist in the `imperative` namespace. If missing, the `init-data-gzipper` job may have failed: + [source,terminal] ---- @@ -178,7 +178,7 @@ Check Kyverno pods are healthy: oc get pods -n kyverno ---- + -All Kyverno pods should be `Running`. If not, check logs: +All Kyverno pods must be `Running`. If not, check logs: + [source,terminal] ---- @@ -239,7 +239,7 @@ oc describe policyreport -n ''' Problem:: CoCo pods not picking up new initdata after cert rotation or KBS TLS changes -Solution:: Kyverno's autogen is disabled by design to ensure rollout restarts pick up new initdata. You must manually restart deployments. +Solution:: Kyverno autogen is disabled by design to ensure rollout restarts pick up new initdata. You must manually restart deployments. + Rollout restart the deployment to pick up new initdata: + @@ -256,7 +256,7 @@ oc get pod -n -o yaml | \ grep io.katacontainers.config.hypervisor.cc_init_data ---- + -The annotation value should be a long base64-encoded string. If it matches the old value, the ConfigMap may not have been updated. Check the source ConfigMap in the `imperative` namespace. +The annotation value is a long base64-encoded string. If it matches the old value, the ConfigMap may not have been updated. Check the source ConfigMap in the `imperative` namespace. == Bare metal issues @@ -276,7 +276,7 @@ Verify the TDX kernel module is loaded: oc debug node/ -- chroot /host lsmod | grep tdx ---- + -Expected output should include `kvm_intel` with TDX support. +The expected output includes `kvm_intel` with TDX support. + Check NFD worker logs: + @@ -319,7 +319,7 @@ Verify the PCCS secret exists and contains the API key: oc get secret -n intel-dcap pccs-api-key -o yaml ---- + -The secret should have `PCCS_API_KEY` field (base64 encoded). +The secret must have a `PCCS_API_KEY` field (base64 encoded). + If the secret is missing or incorrect, update `~/values-secret-coco-pattern.yaml` with your Intel PCS API key and re-run: + @@ -340,7 +340,7 @@ Check node labels: oc get nodes --show-labels | grep tdx ---- + -Nodes with TDX should have `intel.feature.node.kubernetes.io/tdx=true`. +Nodes with TDX have the label `intel.feature.node.kubernetes.io/tdx=true`. + If labels are missing, NFD may not have detected TDX. See "NFD not detecting TDX" troubleshooting above. + @@ -418,7 +418,7 @@ oc patch installplan -n nvidia-gpu-operator \ ''' Problem:: `kata-cc-nvidia-gpu` RuntimeClass missing -Solution:: This is often a timing issue. The GPU reconciliation job should trigger RuntimeClass creation. +Solution:: This is often a timing issue. The GPU reconciliation job triggers RuntimeClass creation. + Check if the `reconcile-kataconfig-gpu` job has run: + @@ -434,14 +434,14 @@ Check job logs: oc logs -n imperative jobs/reconcile-kataconfig-gpu ---- + -If the job hasn't run, it may be waiting for GPU nodes to be labeled. Verify GPU Operator labeled the nodes: +If the job has not run, it may be waiting for GPU nodes to be labeled. Verify GPU Operator labeled the nodes: + [source,terminal] ---- oc get nodes --show-labels | grep nvidia ---- + -Nodes with GPUs should have `nvidia.com/gpu.present=true`. +Nodes with GPUs have the label `nvidia.com/gpu.present=true`. + Manually trigger KataConfig reconciliation: + @@ -490,7 +490,7 @@ Nodes must reboot for IOMMU kernel parameters to take effect. oc debug node/ -- chroot /host lspci -nnk -d 10de: ---- + -GPUs should show `Kernel driver in use: vfio-pci`. If not, check VFIO manager logs: +GPUs show `Kernel driver in use: vfio-pci`. If not, check VFIO manager logs: + [source,terminal] ---- @@ -630,7 +630,7 @@ Wait for all nodes to finish rebooting: oc get mcp ---- + -All MachineConfigPools should show `UPDATED=True` and `DEGRADED=False`. +All MachineConfigPools show `UPDATED=True` and `DEGRADED=False`. + Re-trigger secret loading: + @@ -661,7 +661,7 @@ Delete the pod to trigger recreation with correct annotations: oc delete pod -n ---- + -The deployment will recreate the pod, and Kyverno will inject the `cc_init_data` annotation during admission. +The deployment recreates the pod, and Kyverno injects the `cc_init_data` annotation during admission. + Verify the new pod has the annotation: + @@ -684,7 +684,7 @@ Common BIOS settings to check: - TDX enablement (must be re-enabled after SGX reset) - TME (Total Memory Encryption) settings + -Without an SGX reset, the platform's attestation evidence will not match expected values and Trustee will reject attestation requests. +Without an SGX reset, the platform attestation evidence does not match expected values and Trustee rejects attestation requests. ''' Problem:: Confidential containers failing due to TEE not enabled in BIOS @@ -717,4 +717,4 @@ oc debug node/ -- chroot /host dmesg | grep -i sev + Expected: Messages indicating SEV-SNP initialization succeeded. + -If TEE capabilities are not detected at the kernel level, Node Feature Discovery (NFD) will not label nodes, and confidential runtime classes will not be schedulable. Fix the BIOS configuration before proceeding with pattern deployment. +If TEE capabilities are not detected at the kernel level, Node Feature Discovery (NFD) does not label nodes, and confidential runtime classes are not schedulable. Fix the BIOS configuration before proceeding with pattern deployment.