feat: restart RAG container on config change#343
Conversation
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 0 |
TIP This summary will be updated as you push new changes. Give us feedback
📝 WalkthroughWalkthroughThe pull request refines Docker service failure handling to prevent premature exit when scaling services to zero, and adds SHA-256-based config versioning to RAG services by injecting a Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/internal/orchestrator/swarm/service_spec.go`:
- Around line 43-51: The current ragConfigHash(config map[string]any) marshals
the entire config causing unrelated changes to trigger restarts; change it to
marshal only the pipelines entry: extract config["pipelines"] (safely handle
missing or nil by using an empty value), json.Marshal that pipelines value (not
the whole map) to produce b, then compute the sha256 on b and return the short
hex as before; ensure deterministic JSON by relying on json.Marshal of the
pipelines sub-structure so only pipeline changes affect the resultant hash.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6b3c3124-c4fb-4854-9193-a604644fe59b
📒 Files selected for processing (3)
server/internal/docker/docker.goserver/internal/orchestrator/swarm/service_spec.goserver/internal/orchestrator/swarm/service_spec_test.go
| // ragConfigHash returns a short hex digest of the RAG service configuration. | ||
| // It is embedded in the container spec as PGEDGE_CONFIG_VERSION so that Docker | ||
| // Swarm detects a TaskTemplate change and restarts the container whenever the | ||
| // pipeline configuration or API keys change. | ||
| func ragConfigHash(config map[string]any) string { | ||
| b, _ := json.Marshal(config) | ||
| sum := sha256.Sum256(b) | ||
| return fmt.Sprintf("%x", sum[:8]) | ||
| } |
There was a problem hiding this comment.
Hash input is broader than the pipelines-only requirement.
On Line 48, json.Marshal(config) hashes the entire service config, so unrelated config edits will also force RAG restarts. If the intended behavior is “restart on config.pipelines changes,” scope the hash input to config["pipelines"].
♻️ Proposed fix
func ragConfigHash(config map[string]any) string {
- b, _ := json.Marshal(config)
+ b, _ := json.Marshal(config["pipelines"])
sum := sha256.Sum256(b)
return fmt.Sprintf("%x", sum[:8])
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/internal/orchestrator/swarm/service_spec.go` around lines 43 - 51, The
current ragConfigHash(config map[string]any) marshals the entire config causing
unrelated changes to trigger restarts; change it to marshal only the pipelines
entry: extract config["pipelines"] (safely handle missing or nil by using an
empty value), json.Marshal that pipelines value (not the whole map) to produce
b, then compute the sha256 on b and return the short hex as before; ensure
deterministic JSON by relying on json.Marshal of the pipelines sub-structure so
only pipeline changes affect the resultant hash.
There was a problem hiding this comment.
Not applying the suggestion — the full config hash is intentional. defaults (token budget, top_n, search weights) also lands in the generated YAML
There was a problem hiding this comment.
@tsivaprasad, understood! Since all config fields — including defaults like token budget, top_n, and search weights — feed into the generated YAML, hashing the full config is the correct approach to ensure the container restarts on any meaningful change. I'll note this for future reviews.
✏️ Learnings added
Learnt from: tsivaprasad
Repo: pgEdge/control-plane PR: 343
File: server/internal/orchestrator/swarm/service_spec.go:43-51
Timestamp: 2026-04-13T15:46:51.867Z
Learning: In `server/internal/orchestrator/swarm/service_spec.go` (Go), `ragConfigHash` intentionally marshals the entire `config map[string]any` (not just `config["pipelines"]`) because all config fields — including defaults such as token budget, top_n, and search weights — are written into the generated YAML. Hashing the full config ensures the RAG container restarts whenever any config value changes. Do not suggest scoping the hash to `config["pipelines"]` only.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: tsivaprasad
Repo: pgEdge/control-plane PR: 302
File: server/internal/orchestrator/swarm/orchestrator.go:686-704
Timestamp: 2026-04-04T13:55:26.501Z
Learning: In `server/internal/orchestrator/swarm/orchestrator.go` (Go), the `generateRAGInstanceResources` function intentionally omits the `ServiceInstanceSpecResource`/`ServiceInstanceResource` pair (and the corresponding `KeysPath` mount wiring). The deployable service spec/instance resources for RAG are deferred to a subsequent ticket. Do not flag the absence of `ServiceInstanceSpecResource`/`ServiceInstanceResource` in `generateRAGInstanceResources` as a missing implementation during code review.
Learnt from: rshoemaker
Repo: pgEdge/control-plane PR: 280
File: server/internal/orchestrator/swarm/mcp_config_resource.go:90-94
Timestamp: 2026-03-02T20:15:14.804Z
Learning: In MCP config resources (e.g., server/internal/orchestrator/swarm/mcp_config_resource.go and similar tokens.yaml/users.yaml), Refresh should only verify the existence of config.yaml. Tokens.yaml and users.yaml are owned and managed by the MCP server at runtime and should not be rewritten on Update. Do not trigger recreation that would overwrite runtime state. This ownership model is consistent with the comment indicating Do NOT touch tokens.yaml or users.yaml. Apply this guidance to all MCP config resource files in the swarm directory.
Learnt from: rshoemaker
Repo: pgEdge/control-plane PR: 287
File: server/internal/orchestrator/swarm/service_user_role.go:59-60
Timestamp: 2026-03-11T16:05:27.239Z
Learning: In server/internal/orchestrator/swarm/service_user_role.go, treat a ResourceVersion bump (e.g., 2 -> 3) as a documentation/safety marker only. When the resource Identifier key changes (e.g., ServiceInstanceID to ServiceID), the system will automatically recreate resources due to the Type/ID key mismatch. Do not flag a ResourceVersion bump as a missing recreate trigger if an Identifier change is already causing a full recreate for that resource. This guidance is file-specific and should be applied when reviewing changes to service_user_role.go.
Learnt from: rshoemaker
Repo: pgEdge/control-plane PR: 266
File: server/internal/workflows/update_database.go:122-153
Timestamp: 2026-02-09T21:47:46.904Z
Learning: In server/internal/workflows/update_database.go and related Go workflow files (e.g., provision_services.go), implement idempotent provisioning semantics: re-running ProvisionServices on already-provisioned service instances should be safe and non-destructive. Upsert operations (e.g., StoreServiceInstance) may temporarily set state to 'creating' but should recover back to 'running' via monitoring. Use SQL-level idempotence for user/service roles (e.g., CREATE ROLE IF NOT EXISTS / ALTER ROLE) and ensure deployment diffs compare desired vs. current state. These patterns support recovery when a user retries with corrected specs after a failure, and should be considered in similar Go workflow files across the repository.
Learnt from: rshoemaker
Repo: pgEdge/control-plane PR: 280
File: server/internal/orchestrator/swarm/mcp_config.go:95-97
Timestamp: 2026-03-02T20:17:36.231Z
Learning: In the pgEdge/control-plane repository, internal (unexported) functions that are called through controlled paths by the resource framework should assume validated input from their invocation boundaries. Do not add defensive nil guards inside such internal helpers; rely on validation at the resource construction/invocation boundaries (e.g., MCPConfigResource.Create/Update) to guarantee data integrity. This guideline applies to all Go files under server/internal where internal helpers serve as implementation details behind controlled APIs.
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 293
File: server/internal/orchestrator/common/paths.go:79-157
Timestamp: 2026-03-24T13:24:39.229Z
Learning: In pgEdge/control-plane, when reviewing Go packages under server/ that use the samber/do injector, apply the `Provide()` requirement only to packages that expose singleton components which are consumed as dependencies by other singleton components via the injector. Do NOT require `Provide()` for pure utility/value-type packages (e.g., path helpers, data structures) that are not injected as singleton dependencies.
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 312
File: server/internal/workflows/backend/etcd/store.go:52-58
Timestamp: 2026-03-24T19:00:21.971Z
Learning: In pgEdge/control-plane, treat errors returned from application startup/initialization functions as fatal: a startup failure should trigger the normal application shutdown path, which then calls each component’s `Shutdown`/`Stop` for cleanup. Therefore, when reviewing code in startup/initialization paths (e.g., cache/etcd initialization like `StartCaches`), do not require explicit rollback/cleanup of partially-initialized resources inside the startup function based on its error return. Avoid review suggestions that push intermediate rollback patterns for these fatal startup errors, since the existing shutdown sequence is responsible for teardown.
Summary
This PR restart the RAG container when
config.pipelineschanges via a ControlPlane update, so new pipeline settings and API keys take effect without
manual intervention.
Changes
PGEDGE_CONFIG_VERSION=<hash>as an env var in the RAG containerspec. Docker Swarm only restarts a service when its
TaskTemplatechanges; a config-only update writes a new YAML to the bind mount but
the container would otherwise keep running with the old config in memory.
The hash (SHA256 of
config.pipelines, truncated to 8 bytes) changeswhenever pipelines or API keys change, making the update visible to Swarm.
Testing
Verification:
Verified hash
Verified hash
Checklist
Notes for Reviewers
The hot-reload acceptance criteria (in-flight query continuity, no downtime) are for the RAG server repo in a parallel ticket. This PR covers the control-plane side: ensuring the container is restarted with the updated config file on disk.