feat(deploy): refactor deployments workflow to remove direct gitops pushing (DT-164)#62
Conversation
lucasmundim
left a comment
There was a problem hiding this comment.
Workflow change is clean — dropping paths-ignore: 'deployment/**' is correct now that there are no bot commits to ignore. One rollout concern specific to this chart inline.
Verified safe: CI tags with VERSION=${{ github.sha }} (full SHA), matches gitops#130's ^[0-9a-f]{40}$ regex. Lab is covered via the existing clusters: {} AppSet generator — image-updater annotations propagate to the -lab Application, so the removed explicit deploy-lab step is replaced functionally (no regression there).
|
|
||
| permissions: | ||
| contents: write | ||
| contents: read |
There was a problem hiding this comment.
Heads-up — this chart has a claimerImage.tag (in values.yaml, overrides/<env>.yaml, and read by templates/deployment.yaml's CLAIMER_IMAGE env) that the current image-policy-applier flow keeps in lockstep with image.tag via two $imagepolicy markers per overrides file.
Image-updater (per gitops#130) will only touch image.tag, so after the first rollout the controller pods run the new SHA but the claimer init container used by batch pipeline jobs will pin to the last image-policy-applier write and drift further behind on every subsequent deploy.
Cheapest fix is a chart change to default claimerImage.tag to .Values.image.tag:
value: "{{ .Values.claimerImage.repository }}:{{ .Values.claimerImage.tag | default .Values.image.tag | default .Chart.AppVersion }}"and drop the claimerImage.tag line + $imagepolicy marker from each overrides/<env>.yaml.
There was a problem hiding this comment.
Brilliant catch! To resolve this cleanly without re-introducing old GHA-side image-policy-applier pushes, I've:
-
Updated
templates/deployment.yamlto dynamically defaultclaimerImage.tagtoimage.tagso they stay in lockstep automatically:
-
Surgically stripped the static
claimerImage.tagoverrides from all environment overrides:
-
Configured the companion
gitopsside of the Image Updater configuration to write overrides back to gitops.git:
https://github.com/PlainsightAI/gitops/blob/eda08abf3ba024fd23f329ecc9a8eaebfc36b15f/controllers/openfilter-pipelines-controller/openfilter-pipelines-controller-appset.yaml#L16-L21
820f461 to
cdd9f39
Compare
lucasmundim
left a comment
There was a problem hiding this comment.
Prior claimerImage.tag drift concern is resolved — template fallback to image.tag plus removal of the $imagepolicy markers in overrides means image-updater's rewrite of image.tag propagates to the claimer init container. LGTM.
| {{- end }} | ||
| - name: CLAIMER_IMAGE | ||
| value: "{{ .Values.claimerImage.repository }}:{{ .Values.claimerImage.tag | default .Chart.AppVersion }}" | ||
| value: "{{ .Values.claimerImage.repository }}:{{ .Values.claimerImage.tag | default .Values.image.tag | default .Chart.AppVersion }}" |
There was a problem hiding this comment.
Nit (doc-only, non-blocking): this template change makes the surrounding documentation slightly stale. values.yaml:18 comment and README.md:44 both say claimerImage.tag defaults to the chart appVersion — with this change it actually defaults to .Values.image.tag first (which itself falls back to .Chart.AppVersion). Suggest something like: # Overrides the image tag, which defaults to the controller image.tag (and ultimately the chart appVersion) in values.yaml, then re-run helm-docs to regenerate the README.
|
That is a fantastic catch! The claimer tag would indeed have drifted on subsequent deploys. To resolve this cleanly without re-introducing old GHA-side
This keeps both images in perfect lockstep automatically, with zero GHA-side tag-bump pipelines. Force-pushed and squashed! |
shingonoide
left a comment
There was a problem hiding this comment.
Overall the workflow simplification is correct and the DT-164 scope is fully covered: deploy jobs gone, GH_BOT_USER_PAT dropped, contents: write replaced with contents: read, and paths-ignore: deployment/** removed now that there are no bot commits to ignore in this repo. The claimer image tag fallback in the template (claimerImage.tag | default .Values.image.tag | default .Chart.AppVersion) is the right fix for the drift issue flagged in review.
Two things worth confirming before this moves to ready-for-review:
Merge ordering (gitops#130 first). Once this PR lands, the build job will publish images to GAR but nothing will deploy them. ArgoCD Image Updater (installed and configured in gitops#130) is the replacement. gitops#130 needs to be merged and reconciled in the cluster before any of the three DT-164 sibling PRs (plainsight-api#796, plainsight-deployment-agent#59, this one) are merged. Is there a tracking note or merge checklist somewhere to enforce that order?
Simultaneous rollout across all environments. The previous workflow had an explicit deploy step followed by deploy-lab, giving a natural sequencing point between production and lab. The new setup writes a single shared image-tag.yaml via the clusters: {} AppSet generator, so all environments reconcile to the new tag in the same image-updater cycle. That may be fine for this service, but it is a behavioral change worth acknowledging intentionally.
Minor nit: .github/workflows/deployments.yml:6 says "avoiding automated plainsight-bot commits to this repository." Since image-updater writes back to gitops.git, not this repo, rephrasing to "image-updater commits land in PlainsightAI/gitops instead of this repository" would be slightly more precise.
lucasmundim
left a comment
There was a problem hiding this comment.
Review
The chart change here is load-bearing and probably needs to ship faster than the rest of this PR. Detail below.
Status vs current main (post-#71)
#71 merged yesterday and rewrote .github/workflows/deployments.yml → .github/workflows/docker-publish.yml (Docker Hub publish, chart-OCI on tags, deploy/deploy-lab kept). So this PR collides with main on 5 files:
.github/workflows/deployments.yml— modify/delete (PR modifies, main deleted). The PR'sbuildjob here pushes to GAR, which is the opposite direction from #71 (Docker Hub). Merging as-is would re-introduce GAR push and break the OSS-publish guarantee.deployment/openfilter-pipelines-controller/overrides/{development,lab,production,staging}.yaml— content conflict. PR dropstag:/$imagepolicymarkers; main has newrepository: plainsightai/*from #71. Rebase needs to keep #71'srepositoryand take PR's deletions.
The load-bearing piece: claimer-tag drift
deployment/openfilter-pipelines-controller/templates/deployment.yaml:121 — adding the default .Values.image.tag fallback to the claimer's tag is the right fix, and it's plugging a real drift hole that opens the moment gitops#130 lands. Today, image-policy-applier bumps both image.tag and claimerImage.tag (both carry $imagepolicy markers). Once Image Updater takes over via gitops#130, it only writes image.tag (that's all the image-list annotation covers) — claimerImage.tag would freeze at whatever SHA it was last set to while the controller advances. Controller + claimer drift, and you'd notice it the next time the claimer hits a removed flag or API change.
The two halves of the fix (template fallback + override deletions of claimerImage.tag/marker) are independent of the workflow change in this PR and would land cleanly against current main. Strongly recommend hoisting them to a separate, immediate PR so the drift can't bite during the gitops#130 rollout. Happy to drive that — let me know.
Sequencing suggestion
- Now: ship just the claimer-tag fallback + 4 override deletions (cherry-pick, or I open a separate PR).
- After gitops#130 is reconciled across clusters: revisit dropping
deploy+deploy-labfromdocker-publish.yml. That removal should be done against main (not by revivingdeployments.yml). - Separate effort: the semver-in-overrides direction. This PR doesn't move the
image.taglines to0.5.x; that's still a follow-up.
Inline notes on the three load-bearing hunks.
| # to this repository. | ||
| name: Deployments | ||
|
|
||
| permissions: |
There was a problem hiding this comment.
This file no longer exists on main — #71 deleted it and replaced it with .github/workflows/docker-publish.yml (Docker Hub publish + chart-OCI on tag + preserved deploy/deploy-lab). Modifying it here would resurrect a deleted file on rebase and re-introduce a GAR build job, which is the opposite direction from #71. Recommend dropping this hunk entirely from the PR — the deploy-removal you're after should be a separate PR against docker-publish.yml once gitops#130 is reconciled in clusters.
There was a problem hiding this comment.
Agreed — deployments.yml is dropped on this rebase (file was deleted in main by #71). The deploy/deploy-lab removal will be a separate PR against docker-publish.yml once gitops#130 is reconciled.
| {{- end }} | ||
| - name: CLAIMER_IMAGE | ||
| value: "{{ .Values.claimerImage.repository }}:{{ .Values.claimerImage.tag | default .Chart.AppVersion }}" | ||
| value: "{{ .Values.claimerImage.repository }}:{{ .Values.claimerImage.tag | default .Values.image.tag | default .Chart.AppVersion }}" |
There was a problem hiding this comment.
This is the load-bearing change. The default .Values.image.tag fallback closes a real drift hole that opens the moment gitops#130 lands: Image Updater only rewrites image.tag, so without this fallback claimerImage.tag would freeze at its last value while the controller advances. Strongly suggest hoisting this one-line change + the 4 override claimerImage.tag deletions to an immediate PR against current main so the fix lands before gitops#130 reconciles. Happy to drive that if you'd prefer to keep #62 focused on the larger refactor.
There was a problem hiding this comment.
Agreed on urgency. The fallback was already in main before this rebase — it landed with the first commit on this branch and is now in main at deployment/openfilter-pipelines-controller/templates/deployment.yaml L121-L126.
| @@ -3,7 +3,6 @@ image: | |||
| tag: "2949155e09bb97b57eca9803761199defc527704" # {"$imagepolicy": "production:openfilter-pipelines-controller:tag"} | |||
There was a problem hiding this comment.
When rebasing, keep #71's repository: plainsightai/openfilter-pipelines-controller and repository: plainsightai/openfilter-pipelines-claimer (these were flipped from GAR to Docker Hub in #71). The PR's deletions of claimerImage.tag + $imagepolicy marker stay, but the GAR us-west1-docker.pkg.dev/... repository strings must not return.
There was a problem hiding this comment.
Done — all four env overrides now carry repository: plainsightai/openfilter-pipelines-controller / plainsightai/openfilter-pipelines-claimer and tag: "0.5.1". The $imagepolicy annotations are dropped; Image Updater will be rewired with a semver filter in gitops#130. See deployment/openfilter-pipelines-controller/overrides/production.yaml L1-L5 (same pattern in development/lab/staging).
Hoists the load-bearing piece of #62 (DT-164) so the drift hole closes before gitops#130 (ArgoCD Image Updater) reconciles in clusters. Today both `image.tag` and `claimerImage.tag` carry `$imagepolicy` markers and image-policy-applier bumps both — controller and claimer stay in lockstep. Once gitops#130 lands, Image Updater takes over and only rewrites `image.tag` (the `image-list` annotation covers `openfilter-pipelines-controller`, not the claimer). `claimerImage.tag` would freeze at its last set value while the controller advances, silently drifting until the next claimer-side flag or API change makes the gap user-visible. Fix: - `templates/deployment.yaml:121` — `CLAIMER_IMAGE` rendering gains a `default .Values.image.tag` fallback so the claimer follows the controller's tag whenever its own is unset. `Chart.AppVersion` stays as the last-resort floor for fresh installs. - 4 overrides drop the static `claimerImage.tag: <sha>` line (with the `$imagepolicy` marker). `claimerImage.repository: plainsightai/*` stays — that's the post-#71 Docker Hub path. With no `tag:` set, the template's fallback chain takes over. Validated by `helm template` against `overrides/production.yaml`: `CLAIMER_IMAGE=plainsightai/openfilter-pipelines-claimer:<image.tag>` as expected. Cross-refs: #71 (Docker Hub publish that introduced the drift hole), #62 (Sam's larger DT-164 refactor that originally bundled this fix), gitops#130 (the Image Updater rollout this guards against).
918fd62 to
65c315b
Compare
lucasmundim
left a comment
There was a problem hiding this comment.
Approving — config-only diff. v0.5.1 is published on Docker Hub, claimer falls through to image.tag correctly, and dropping the $imagepolicy markers is consistent with the ArgoCD Image Updater rewire in gitops#130. Confirm gitops#130 lands in lockstep so the tag-rotation gap doesn't widen.
…pfc#71) Replace GAR repository paths and SHA pins with Docker Hub image coordinates and the current semver tag (0.5.1). Drop $imagepolicy annotations — Image Updater will be rewired to Docker Hub with a semver filter in gitops#130.
65c315b to
67921b0
Compare
Depends on plainsightai/gitops#130
Need help on this PR? Tag
@codesmithwith what you need. Autofix is disabled.