diff --git a/README.md b/README.md index 96e1168..db0a117 100644 --- a/README.md +++ b/README.md @@ -127,7 +127,7 @@ See [tests/README.md](tests/README.md) for the full test matrix. ## Security -Device Connect supports encryption in transit (TLS/mTLS), JWT/NKey authentication for NATS, device commissioning with PIN validation, and per-device ACLs. This is an area of active development to be further expanded in upcoming releases. +Device Connect supports encryption in transit (TLS/mTLS), JWT/NKey authentication for NATS, device commissioning with PIN validation, and per-device ACLs. See [SECURITY.md](SECURITY.md) for the threat model, deployment tradeoffs, and vulnerability reporting. ## License diff --git a/SECURITY.md b/SECURITY.md index c7ddba4..520dbf2 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -27,10 +27,233 @@ This policy covers all packages in the Device Connect monorepo: | Package | Scope | |---------|-------| -| `device-connect-edge` | Messaging clients, device runtime, credential handling | -| `device-connect-server` | Registry service, JWT/TLS security, state store, CLIs | +| `device-connect-edge` | Messaging clients, device runtime, credential handling, D2D discovery | +| `device-connect-server` | Registry service, security/commissioning, state store, portal, CLIs | | `device-connect-agent-tools` | Agent connection, MCP bridge, tool invocation | ## Supported Versions Security fixes are applied to the latest release on `main`. We do not backport fixes to older versions. + +--- + +## Threat model + +This section describes what Device Connect is designed to protect, where trust is assumed, and which risks remain by deployment mode. It is intended for operators, integrators, and security reviewers—not a formal certification artifact. + +### Design goals + +1. **Authenticated messaging** — only principals with valid credentials can publish or subscribe on tenant-scoped subjects (when NATS JWT or Zenoh mTLS is enabled). +2. **Tenant isolation** — workloads in tenant A must not read or invoke tenant B’s devices when broker-enforced isolation is configured. +3. **Provisioned identity** — devices receive credentials through commissioning or admin tooling, not self-asserted IDs alone. +4. **Defense in depth** — subject namespacing, registry storage paths, portal API scopes, and optional application ACLs layer on top of transport security. + +Device Connect is **not** a zero-trust network overlay. It assumes the messaging broker and etcd (or equivalent registry backend) are operated by a trusted party or hardened like any other control-plane infrastructure. + +### Assets + +| Asset | Why it matters | +|-------|----------------| +| Device credentials (JWT+NKey, mTLS keys) | Full participation in a tenant’s mesh; invoke RPC, emit events, register | +| Registry / etcd data | Device inventory, capabilities, status, portal tokens | +| Portal sessions & agent API tokens (`dcp_…`) | Provision devices, download credentials, invoke on behalf of users | +| Factory commissioning PIN | One-time bootstrap before operational credentials exist | +| State store keys | Orchestration locks, workflow state | +| Message payloads (RPC, events, streams) | Operational data, potentially PII or safety-critical commands | + +### Trust boundaries + +```mermaid +flowchart TB + subgraph untrusted["Often untrusted or hostile"] + LAN["LAN / site network"] + Internet["Internet / multi-site"] + end + + subgraph edge["Edge tier"] + Dev["Devices — device-connect-edge"] + Agent["Agents — device-connect-agent-tools"] + end + + subgraph infra["Infrastructure tier — operator trust"] + Broker["Messaging broker — NATS / Zenoh router / MQTT"] + Reg["Registry + etcd"] + Portal["Portal + CLIs"] + State["State store"] + end + + LAN --> Dev + LAN --> Agent + Dev --> Broker + Agent --> Broker + Dev --> Reg + Agent --> Reg + Portal --> Reg + Portal --> Broker + Portal --> State + Internet --> Portal +``` + +**Inside the boundary (trusted if hardened):** NATS/Zenoh/MQTT brokers, etcd, registry service, portal host, TLS CAs used for mTLS. + +**Outside or partially trusted:** factory floors, home LANs, developer laptops, CI runners, any network where `DEVICE_CONNECT_ALLOW_INSECURE=true` is used. + +### Threat actors + +| Actor | Typical goals | +|-------|----------------| +| **Anonymous LAN participant** | Subscribe to presence, spoof discovery, inject multicast traffic (D2D / scouting) | +| **Compromised device** | Lateral movement within tenant subject space; abuse issued JWT or cert | +| **Compromised agent / CI token** | Invoke devices, provision credentials, read events per granted scopes | +| **Tenant A insider** | Access tenant B data (cross-tenant isolation is a primary control) | +| **Messaging broker operator** | Read all traffic if TLS is off or terminated; deny service | +| **Registry/etcd operator** | Tamper with device records, replay stale registrations | +| **Portal user** | Escalate via stolen session, mint over-privileged agent tokens | + +### Attack surfaces + +| Surface | Exposure | Notes | +|---------|----------|--------| +| Messaging transport | High | All RPC, events, heartbeats, registration | +| Registry RPC (`device-connect.{tenant}.registry`) | Medium | List/register devices; privileged callers see more | +| Device commissioning HTTP (TCP, default 5540) | Medium | Active only before operational credentials; PIN-gated | +| Portal browser UI + cookies | Medium | Human admins; session hijack risk | +| Portal agent API (`/api/agent/v1/*`, Bearer `dcp_…`) | Medium | Scoped tokens; secret shown once at mint | +| D2D presence (`device-connect.{tenant}.*.presence`) | High on open LAN | No broker ACL; metadata visible to LAN peers | +| State store API | Medium | Depends on deployment; often co-located with server | +| Credential files on disk | High | `.creds.json`, TLS key material on device/agent hosts | + +--- + +## Security controls + +How major controls map to threats: + +| Control | Mitigates | Package / location | +|---------|-----------|-------------------| +| **NATS JWT subject permissions** | Cross-tenant publish/subscribe, credential forgery without account key | Broker + `security_infra/` | +| **TLS / mTLS on messaging** | Eavesdropping, impersonation at transport layer | Edge adapters, broker config | +| **Per-device JWT or client cert** | Stolen single device ≠ whole account (when per-user creds used) | `gen_creds.sh`, `generate_tls_certs.sh` | +| **Tenant subject prefix** `device-connect.{tenant}.>` | Accidental cross-tenant traffic; pairs with JWT | Edge, server, registry | +| **etcd path isolation** `/device-connect/{tenant}/…` | Registry list leakage across tenants | Registry service | +| **Commissioning PIN (bcrypt)** | Unauthorized credential install on uninitialized device | `security/commissioning.py` | +| **Portal agent token scopes** | Over-powered automation (`devices:invoke`, `devices:credentials`, etc.) | `portal/services/tokens.py` | +| **Token secret hashing (SHA-256)** | etcd backup disclosure ≠ live API access | Portal tokens | +| **`DEVICE_CONNECT_ALLOW_INSECURE` gate** | Accidental production deploy without auth (must be explicit) | Edge `DeviceRuntime` | +| **Application ACL models** | Device-to-device visibility and RPC policy (when enforced by callers) | `security/acl.py` | +| **Attestation field on registration** | Hook for future supply-chain / identity binding (not a standalone guarantee) | Registry API | + +Operational detail for NATS JWT and multi-tenant setup: [packages/device-connect-server/security_infra/README.md](packages/device-connect-server/security_infra/README.md). + +--- + +## Deployment modes and tradeoffs + +Choose a mode deliberately; mixing modes on one LAN often creates the weakest link. + +### Full infrastructure (NATS + JWT + registry) — recommended for multi-tenant production + +| Benefit | Tradeoff | +|---------|----------| +| Broker-enforced tenant isolation | Requires NATS JWT operator/account lifecycle (`nsc`, `security_infra`) | +| Central discovery and TTL | Registry and etcd are SPOF unless clustered | +| Auditable subject space | All metadata and payloads cross the broker | + +**Residual risk:** A device JWT scoped to `device-connect.{tenant}.>` can still invoke any peer in that tenant unless application ACLs are applied. Compromised registry credentials (privileged `registry` / `devctl` users) see all tenants. + +### Zenoh with mTLS (router or peer) + +| Benefit | Tradeoff | +|---------|----------| +| Strong device identity via client certificates | **No broker-level ACL** — isolation is naming + app logic only | +| Good for high-frequency streaming | Tenant boundaries are not cryptographically enforced on the wire | +| Optional D2D multicast scouting | LAN participants can discover peers without registry | + +**Residual risk:** Any holder of a valid client cert trusted by the router may access subjects you publish. Use separate CAs or routers per tenant for strict isolation. + +### Device-to-device (D2D) mode — dev / small closed LAN + +Activated when discovery mode is D2D or infrastructure is unavailable (Zenoh multicast scouting, presence subjects). + +| Benefit | Tradeoff | +|---------|----------| +| No registry/etcd required | **No tenant enforcement** beyond convention | +| Fast local iteration | Presence broadcasts capabilities to the LAN | +| | No persistent offline inventory | + +**Do not use D2D on untrusted networks.** Treat it like open mDNS: any neighbor can observe and potentially interact if auth is weak or disabled. + +### Development / integration (`DEVICE_CONNECT_ALLOW_INSECURE=true`) + +| Benefit | Tradeoff | +|---------|----------| +| Simplifies Docker-based tests and local demos | **No authentication** on messaging connect path | +| | Credential files may still be loaded but broker may not enforce them | + +**Never enable in production.** CI and `docker-compose-itest.yml` use this mode by design. + +### Portal + coding agents + +| Benefit | Tradeoff | +|---------|----------| +| Scoped Bearer tokens for automation | Tokens are bearer secrets; leakage = API access until revocation | +| Browser session for humans | Cookie theft, CSRF surface (portal middleware) | +| CLI login flow with user approval | Social engineering on approval step | + +Agent tokens are stored in etcd by hash only; the plaintext `dcp_{id}_{secret}` is shown once. Prefer least-privilege scopes (`devices:read` vs `devices:invoke` vs `admin:*`). + +### MQTT backend + +Supported for IoT-style deployments. Security is typically username/password + TLS at the broker. Device Connect does not add MQTT-specific tenant ACLs in-tree; rely on broker configuration and subject design. + +--- + +## Known limitations (current `main`) + +These are intentional gaps or active development areas—not bugs to report without context: + +1. **Zenoh has no JWT-style subject ACL** — multi-tenant production should prefer NATS with JWT unless you operate separate Zenoh realms per tenant. +2. **Application ACLs** (`DeviceACL`, `FunctionACL`, `EventACL`) are modeled in code; not all RPC/event paths may enforce them yet—verify for your integration. +3. **D2D discovery** does not authenticate peer presence; pairing with mTLS or a closed VLAN is required. +4. **Registry registration** trusts valid messaging credentials; attestation is optional metadata, not hardware-rooted trust by default. +5. **Commissioning server** is a local HTTP endpoint; protect the factory network during provisioning. +6. **Agent-tools** inherit the security of whichever messaging URL and credential file you configure; auto-discovery of `security_infra/` paths is convenient but increases risk on shared developer machines. + +--- + +## Hardening checklist + +**Production messaging** + +- [ ] Disable `DEVICE_CONNECT_ALLOW_INSECURE` +- [ ] Enable TLS; use mTLS for Zenoh devices where possible +- [ ] Per-device credentials (`gen_creds.sh --user …` or `--tenant …`), not shared operator creds on devices +- [ ] Run `verify_tenants.sh` after JWT topology changes + +**Infrastructure** + +- [ ] Cluster etcd; restrict network access to registry and etcd ports +- [ ] Separate privileged (`registry`, `devctl`) credentials from device credentials +- [ ] Rotate portal agent tokens; use minimal scopes and labels + +**Edge** + +- [ ] Protect credential and key files (filesystem permissions, no images in public registries) +- [ ] Complete commissioning once; disable factory PIN exposure +- [ ] Avoid D2D on networks with untrusted hosts + +**Agents / CI** + +- [ ] Store `dcp_…` tokens in secret managers, not repos +- [ ] Do not mount production `.creds.json` into untrusted CI without isolation + +--- + +## Related documentation + +| Document | Contents | +|----------|----------| +| [security_infra/README.md](packages/device-connect-server/security_infra/README.md) | NATS JWT setup, tenant isolation scripts | +| [device-connect-server/README.md](packages/device-connect-server/README.md#device-commissioning-flow) | Commissioning flow | +| [device-connect-edge/README.md](packages/device-connect-edge/README.md#credentials) | Device credential consumption | +| [portal/README.md](packages/device-connect-server/device_connect_server/portal/README.md) | Portal and agent API auth |