Skip to content

feat(deploy): refactor deployments workflow to remove direct gitops pushing (DT-164)#62

Merged
stwilt merged 1 commit into
mainfrom
feat/dt-164-refactor-deployments-workflow
Jun 16, 2026
Merged

feat(deploy): refactor deployments workflow to remove direct gitops pushing (DT-164)#62
stwilt merged 1 commit into
mainfrom
feat/dt-164-refactor-deployments-workflow

Conversation

@stwilt

@stwilt stwilt commented May 24, 2026

Copy link
Copy Markdown
Contributor

Depends on plainsightai/gitops#130


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag @codesmith with what you need. Autofix is disabled.

@stwilt stwilt requested a review from lucasmundim May 24, 2026 08:12

@lucasmundim lucasmundim left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow change is clean — dropping paths-ignore: 'deployment/**' is correct now that there are no bot commits to ignore. One rollout concern specific to this chart inline.

Verified safe: CI tags with VERSION=${{ github.sha }} (full SHA), matches gitops#130's ^[0-9a-f]{40}$ regex. Lab is covered via the existing clusters: {} AppSet generator — image-updater annotations propagate to the -lab Application, so the removed explicit deploy-lab step is replaced functionally (no regression there).

Comment thread .github/workflows/deployments.yml Outdated

permissions:
contents: write
contents: read

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads-up — this chart has a claimerImage.tag (in values.yaml, overrides/<env>.yaml, and read by templates/deployment.yaml's CLAIMER_IMAGE env) that the current image-policy-applier flow keeps in lockstep with image.tag via two $imagepolicy markers per overrides file.

Image-updater (per gitops#130) will only touch image.tag, so after the first rollout the controller pods run the new SHA but the claimer init container used by batch pipeline jobs will pin to the last image-policy-applier write and drift further behind on every subsequent deploy.

Cheapest fix is a chart change to default claimerImage.tag to .Values.image.tag:

value: "{{ .Values.claimerImage.repository }}:{{ .Values.claimerImage.tag | default .Values.image.tag | default .Chart.AppVersion }}"

and drop the claimerImage.tag line + $imagepolicy marker from each overrides/<env>.yaml.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant catch! To resolve this cleanly without re-introducing old GHA-side image-policy-applier pushes, I've:

  1. Updated templates/deployment.yaml to dynamically default claimerImage.tag to image.tag so they stay in lockstep automatically:

    - name: CLAIMER_IMAGE
    value: "{{ .Values.claimerImage.repository }}:{{ .Values.claimerImage.tag | default .Values.image.tag | default .Chart.AppVersion }}"

  2. Surgically stripped the static claimerImage.tag overrides from all environment overrides:

    claimerImage:
    repository: us-west1-docker.pkg.dev/plainsightai-prod/oci/openfilter-pipelines-claimer

  3. Configured the companion gitops side of the Image Updater configuration to write overrides back to gitops.git:
    https://github.com/PlainsightAI/gitops/blob/eda08abf3ba024fd23f329ecc9a8eaebfc36b15f/controllers/openfilter-pipelines-controller/openfilter-pipelines-controller-appset.yaml#L16-L21

@stwilt stwilt force-pushed the feat/dt-164-refactor-deployments-workflow branch from 820f461 to cdd9f39 Compare May 24, 2026 21:19

@lucasmundim lucasmundim left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior claimerImage.tag drift concern is resolved — template fallback to image.tag plus removal of the $imagepolicy markers in overrides means image-updater's rewrite of image.tag propagates to the claimer init container. LGTM.

{{- end }}
- name: CLAIMER_IMAGE
value: "{{ .Values.claimerImage.repository }}:{{ .Values.claimerImage.tag | default .Chart.AppVersion }}"
value: "{{ .Values.claimerImage.repository }}:{{ .Values.claimerImage.tag | default .Values.image.tag | default .Chart.AppVersion }}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (doc-only, non-blocking): this template change makes the surrounding documentation slightly stale. values.yaml:18 comment and README.md:44 both say claimerImage.tag defaults to the chart appVersion — with this change it actually defaults to .Values.image.tag first (which itself falls back to .Chart.AppVersion). Suggest something like: # Overrides the image tag, which defaults to the controller image.tag (and ultimately the chart appVersion) in values.yaml, then re-run helm-docs to regenerate the README.

@stwilt

stwilt commented May 24, 2026

Copy link
Copy Markdown
Contributor Author

That is a fantastic catch! The claimer tag would indeed have drifted on subsequent deploys.

To resolve this cleanly without re-introducing old GHA-side image-policy-applier pushes, I've:

  1. Updated templates/deployment.yaml to dynamically default claimerImage.tag to image.tag:
  2. Surgically stripped the static claimerImage.tag overrides and legacy $imagepolicy markers from all four environment overrides:
  3. Configured the companion gitops side of the Image Updater configuration to write overrides back to gitops.git: openfilter-pipelines-controller-appset.yaml#L16-L21.

This keeps both images in perfect lockstep automatically, with zero GHA-side tag-bump pipelines. Force-pushed and squashed!

@stwilt stwilt requested a review from lucasmundim May 24, 2026 21:41

@shingonoide shingonoide left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the workflow simplification is correct and the DT-164 scope is fully covered: deploy jobs gone, GH_BOT_USER_PAT dropped, contents: write replaced with contents: read, and paths-ignore: deployment/** removed now that there are no bot commits to ignore in this repo. The claimer image tag fallback in the template (claimerImage.tag | default .Values.image.tag | default .Chart.AppVersion) is the right fix for the drift issue flagged in review.

Two things worth confirming before this moves to ready-for-review:

Merge ordering (gitops#130 first). Once this PR lands, the build job will publish images to GAR but nothing will deploy them. ArgoCD Image Updater (installed and configured in gitops#130) is the replacement. gitops#130 needs to be merged and reconciled in the cluster before any of the three DT-164 sibling PRs (plainsight-api#796, plainsight-deployment-agent#59, this one) are merged. Is there a tracking note or merge checklist somewhere to enforce that order?

Simultaneous rollout across all environments. The previous workflow had an explicit deploy step followed by deploy-lab, giving a natural sequencing point between production and lab. The new setup writes a single shared image-tag.yaml via the clusters: {} AppSet generator, so all environments reconcile to the new tag in the same image-updater cycle. That may be fine for this service, but it is a behavioral change worth acknowledging intentionally.

Minor nit: .github/workflows/deployments.yml:6 says "avoiding automated plainsight-bot commits to this repository." Since image-updater writes back to gitops.git, not this repo, rephrasing to "image-updater commits land in PlainsightAI/gitops instead of this repository" would be slightly more precise.

@lucasmundim lucasmundim left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The chart change here is load-bearing and probably needs to ship faster than the rest of this PR. Detail below.

Status vs current main (post-#71)

#71 merged yesterday and rewrote .github/workflows/deployments.yml.github/workflows/docker-publish.yml (Docker Hub publish, chart-OCI on tags, deploy/deploy-lab kept). So this PR collides with main on 5 files:

  • .github/workflows/deployments.yml — modify/delete (PR modifies, main deleted). The PR's build job here pushes to GAR, which is the opposite direction from #71 (Docker Hub). Merging as-is would re-introduce GAR push and break the OSS-publish guarantee.
  • deployment/openfilter-pipelines-controller/overrides/{development,lab,production,staging}.yaml — content conflict. PR drops tag:/$imagepolicy markers; main has new repository: plainsightai/* from #71. Rebase needs to keep #71's repository and take PR's deletions.

The load-bearing piece: claimer-tag drift

deployment/openfilter-pipelines-controller/templates/deployment.yaml:121 — adding the default .Values.image.tag fallback to the claimer's tag is the right fix, and it's plugging a real drift hole that opens the moment gitops#130 lands. Today, image-policy-applier bumps both image.tag and claimerImage.tag (both carry $imagepolicy markers). Once Image Updater takes over via gitops#130, it only writes image.tag (that's all the image-list annotation covers) — claimerImage.tag would freeze at whatever SHA it was last set to while the controller advances. Controller + claimer drift, and you'd notice it the next time the claimer hits a removed flag or API change.

The two halves of the fix (template fallback + override deletions of claimerImage.tag/marker) are independent of the workflow change in this PR and would land cleanly against current main. Strongly recommend hoisting them to a separate, immediate PR so the drift can't bite during the gitops#130 rollout. Happy to drive that — let me know.

Sequencing suggestion

  • Now: ship just the claimer-tag fallback + 4 override deletions (cherry-pick, or I open a separate PR).
  • After gitops#130 is reconciled across clusters: revisit dropping deploy + deploy-lab from docker-publish.yml. That removal should be done against main (not by reviving deployments.yml).
  • Separate effort: the semver-in-overrides direction. This PR doesn't move the image.tag lines to 0.5.x; that's still a follow-up.

Inline notes on the three load-bearing hunks.

Comment thread .github/workflows/deployments.yml Outdated
# to this repository.
name: Deployments

permissions:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file no longer exists on main — #71 deleted it and replaced it with .github/workflows/docker-publish.yml (Docker Hub publish + chart-OCI on tag + preserved deploy/deploy-lab). Modifying it here would resurrect a deleted file on rebase and re-introduce a GAR build job, which is the opposite direction from #71. Recommend dropping this hunk entirely from the PR — the deploy-removal you're after should be a separate PR against docker-publish.yml once gitops#130 is reconciled in clusters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — deployments.yml is dropped on this rebase (file was deleted in main by #71). The deploy/deploy-lab removal will be a separate PR against docker-publish.yml once gitops#130 is reconciled.

{{- end }}
- name: CLAIMER_IMAGE
value: "{{ .Values.claimerImage.repository }}:{{ .Values.claimerImage.tag | default .Chart.AppVersion }}"
value: "{{ .Values.claimerImage.repository }}:{{ .Values.claimerImage.tag | default .Values.image.tag | default .Chart.AppVersion }}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the load-bearing change. The default .Values.image.tag fallback closes a real drift hole that opens the moment gitops#130 lands: Image Updater only rewrites image.tag, so without this fallback claimerImage.tag would freeze at its last value while the controller advances. Strongly suggest hoisting this one-line change + the 4 override claimerImage.tag deletions to an immediate PR against current main so the fix lands before gitops#130 reconciles. Happy to drive that if you'd prefer to keep #62 focused on the larger refactor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on urgency. The fallback was already in main before this rebase — it landed with the first commit on this branch and is now in main at deployment/openfilter-pipelines-controller/templates/deployment.yaml L121-L126.

@@ -3,7 +3,6 @@ image:
tag: "2949155e09bb97b57eca9803761199defc527704" # {"$imagepolicy": "production:openfilter-pipelines-controller:tag"}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When rebasing, keep #71's repository: plainsightai/openfilter-pipelines-controller and repository: plainsightai/openfilter-pipelines-claimer (these were flipped from GAR to Docker Hub in #71). The PR's deletions of claimerImage.tag + $imagepolicy marker stay, but the GAR us-west1-docker.pkg.dev/... repository strings must not return.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — all four env overrides now carry repository: plainsightai/openfilter-pipelines-controller / plainsightai/openfilter-pipelines-claimer and tag: "0.5.1". The $imagepolicy annotations are dropped; Image Updater will be rewired with a semver filter in gitops#130. See deployment/openfilter-pipelines-controller/overrides/production.yaml L1-L5 (same pattern in development/lab/staging).

lucasmundim added a commit that referenced this pull request Jun 7, 2026
Hoists the load-bearing piece of #62 (DT-164) so the drift hole closes
before gitops#130 (ArgoCD Image Updater) reconciles in clusters.

Today both `image.tag` and `claimerImage.tag` carry `$imagepolicy`
markers and image-policy-applier bumps both — controller and claimer
stay in lockstep. Once gitops#130 lands, Image Updater takes over and
only rewrites `image.tag` (the `image-list` annotation covers
`openfilter-pipelines-controller`, not the claimer). `claimerImage.tag`
would freeze at its last set value while the controller advances,
silently drifting until the next claimer-side flag or API change makes
the gap user-visible.

Fix:
- `templates/deployment.yaml:121` — `CLAIMER_IMAGE` rendering gains a
  `default .Values.image.tag` fallback so the claimer follows the
  controller's tag whenever its own is unset. `Chart.AppVersion` stays
  as the last-resort floor for fresh installs.
- 4 overrides drop the static `claimerImage.tag: <sha>` line (with the
  `$imagepolicy` marker). `claimerImage.repository: plainsightai/*`
  stays — that's the post-#71 Docker Hub path. With no `tag:` set, the
  template's fallback chain takes over.

Validated by `helm template` against `overrides/production.yaml`:
`CLAIMER_IMAGE=plainsightai/openfilter-pipelines-claimer:<image.tag>`
as expected.

Cross-refs: #71 (Docker Hub publish that introduced the drift hole),
#62 (Sam's larger DT-164 refactor that originally bundled this fix),
gitops#130 (the Image Updater rollout this guards against).
@stwilt stwilt force-pushed the feat/dt-164-refactor-deployments-workflow branch from 918fd62 to 65c315b Compare June 7, 2026 02:17
@stwilt stwilt requested review from lucasmundim and shingonoide June 7, 2026 02:29

@lucasmundim lucasmundim left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — config-only diff. v0.5.1 is published on Docker Hub, claimer falls through to image.tag correctly, and dropping the $imagepolicy markers is consistent with the ArgoCD Image Updater rewire in gitops#130. Confirm gitops#130 lands in lockstep so the tag-rotation gap doesn't widen.

…pfc#71)

Replace GAR repository paths and SHA pins with Docker Hub image coordinates
and the current semver tag (0.5.1). Drop $imagepolicy annotations — Image
Updater will be rewired to Docker Hub with a semver filter in gitops#130.
@stwilt stwilt force-pushed the feat/dt-164-refactor-deployments-workflow branch from 65c315b to 67921b0 Compare June 16, 2026 19:47
@stwilt stwilt merged commit 1f57079 into main Jun 16, 2026
@stwilt stwilt deleted the feat/dt-164-refactor-deployments-workflow branch June 16, 2026 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants