Skip to content

[codex] Apply generated cpflow GitHub Actions flow#732

Draft
justin808 wants to merge 7 commits intomasterfrom
codex/cpflow-github-actions-flow
Draft

[codex] Apply generated cpflow GitHub Actions flow#732
justin808 wants to merge 7 commits intomasterfrom
codex/cpflow-github-actions-flow

Conversation

@justin808
Copy link
Copy Markdown
Member

@justin808 justin808 commented Apr 30, 2026

Summary

  • Replace the older hand-written Control Plane GitHub Actions with the generated cpflow-* action and workflow set from cpflow generate-github-actions.
  • Keep review apps opt-in, staging deploys branch-gated, production promotion manual, and nightly review-app cleanup in the generated flow.
  • Update Control Plane docs links so they point at the new cpflow-* action/workflow names.

Verification

  • BUNDLE_GEMFILE=/tmp/cpflow-pr-278/Gemfile bin/conductor-exec bundle exec ruby /tmp/cpflow-pr-278/bin/cpflow github-flow-readiness — no blocking readiness issues detected.
  • bin/conductor-exec ruby -e 'require "yaml"; paths = Dir[".github/**/*.yml"] + Dir[".github/**/*.yaml"] + Dir[".controlplane/**/*.yml"]; paths.sort.each { |path| YAML.load_file(path, aliases: true); puts path }'
  • bin/conductor-exec git diff --cached --check
  • bin/conductor-exec bundle exec rubocop

Not Run / Blocked

  • bin/conductor-exec bundle exec rspec could not complete locally because PostgreSQL was not running at /tmp/.s.PGSQL.5432; after the first Rails load failure, Coveralls setup also reported repeated coverage initialization errors.

Note

Medium Risk
Replaces and rewires deployment automation (review apps, staging deploys, and production promotion), so misconfiguration could break deployments or cleanup even though changes are mostly workflow/script-level.

Overview
Switches Control Plane CI/CD from the older hand-written GitHub Actions to the generated cpflow-* composite actions and workflows, removing the previous setup-environment, build-docker-image, review-app deploy/delete, staging deploy, promotion, and nightly cleanup workflows.

Adds a full generated flow: opt-in review apps via /deploy-review-app (with fork safeguards), automated redeploys once created, gated staging deploys, a scheduled stale review-app cleanup, and a more robust manual promotion workflow (env-var name parity check, health checks, multi-workload rollback support, and GitHub release creation).

Updates Control Plane docs/internal notes to reference the new cpflow-* workflow names and clarifies required repo secrets/vars (including REVIEW_APP_PREFIX naming expectations and optional Docker SSH build settings).

Reviewed by Cursor Bugbot for commit 9a390c1. Bugbot is set up for automated code reviews on this repo. Configure here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 77ce357b-520f-4ca0-b5a6-a08141c26fa1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/cpflow-github-actions-flow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Upstream token not passed to copy-image command
    • Added the required -t upstream token argument to the cpflow copy-image-from-upstream command while keeping the token sourced from secrets via the environment.

Create PR

Or push these changes by commenting:

@cursor push 3606e16506
Preview (3606e16506)
diff --git a/.github/workflows/cpflow-promote-staging-to-production.yml b/.github/workflows/cpflow-promote-staging-to-production.yml
--- a/.github/workflows/cpflow-promote-staging-to-production.yml
+++ b/.github/workflows/cpflow-promote-staging-to-production.yml
@@ -203,14 +203,13 @@
 
       - name: Copy image from staging
         env:
-          # Pass the upstream token via env rather than `-t` so it doesn't appear in /proc/<pid>/cmdline.
           CPLN_UPSTREAM_TOKEN: ${{ secrets.CPLN_TOKEN_STAGING }}
           PRODUCTION_APP_NAME: ${{ vars.PRODUCTION_APP_NAME }}
           CPLN_ORG_PRODUCTION: ${{ vars.CPLN_ORG_PRODUCTION }}
         shell: bash
         run: |
           set -euo pipefail
-          cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}"
+          cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" -t "${CPLN_UPSTREAM_TOKEN}" --org "${CPLN_ORG_PRODUCTION}"
 
       - name: Deploy image to production
         env:

You can send follow-ups to the cloud agent here.

shell: bash
run: |
set -euo pipefail
cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream token not passed to copy-image command

High Severity

The copy-image-from-upstream step sets CPLN_UPSTREAM_TOKEN as an environment variable but never passes the upstream token to the cpflow command. The cpflow documentation states this command requires --upstream-token or -t flag — there is no documented support for a CPLN_UPSTREAM_TOKEN env var in cpflow 4.2.0 (the version installed by cpflow-setup-environment). The old workflow correctly used -t ${{ secrets.CPLN_TOKEN_STAGING }}. Without the flag, the command cannot authenticate with the staging org and production promotion will fail at the image copy step.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit eb1dbb2. Configure here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this unchanged. cpflow copy-image-from-upstream in cpflow 4.2.0 supports CPLN_UPSTREAM_TOKEN as an env var, so passing -t here would reintroduce the token-in-process-argv exposure this generated flow is intentionally avoiding. Reference: https://github.com/shakacode/control-plane-flow/blob/add-github-flow-generator/lib/command/copy_image_from_upstream.rb#L34-L36

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: cpflow GitHub Actions Migration

This is a solid migration from hand-written to generated cpflow-* workflows. The overall architecture is a clear improvement. Here's a summary of what stands out.


Security — meaningful fixes

  • Token exposure fixed: The old setup-environment action passed the token as --token "$TOKEN" (visible in /proc/<pid>/cmdline and ps). The new action correctly uses CPLN_TOKEN env var. Same for CPLN_UPSTREAM_TOKEN in the promote workflow.
  • Critical: /deploy-review-app now requires author_association. The old workflow had no author_association check, meaning any GitHub user who could comment on a PR could trigger a deployment. The new check ("OWNER","MEMBER","COLLABORATOR") correctly gates this.
  • Exact command match (== '/deploy-review-app') replaces contains(). Prevents accidental triggers from comments like "I ran /deploy-review-app earlier and it worked".
  • SSH key isolation: The cpflow-build-docker-image action scopes DOCKER_BUILD_SSH_KEY to a dedicated step so it never appears in the build step's environment — even when ACTIONS_STEP_DEBUG=true is set. The EXIT trap cleans up the key file and kills the agent. Well designed.
  • Fork PR handling: Fork PRs correctly skip auto-deploy (which can't access secrets) while still allowing approved collaborators to deploy via /deploy-review-app. The cpflow-review-app-help.yml is explicit about why pull_request_target is used there (no code checkout = safe for write access from forks).

Bugs fixed

  • cancel-in-progress: false on all deploy workflows. The old staging workflow used cancel-in-progress: true, which could interrupt a cpflow deploy-image mid-rollout and leave the workload partially updated. Now fixed across all three deployment workflows.
  • Delete on PR close now uses pull_request_target. The old delete-review-app.yml used pull_request: types: [closed], which runs in a restricted context without secret access when a fork PR closes. The new workflow correctly uses pull_request_target for the close event.
  • --run-release-phase is now conditional. The old staging deploy hardcoded it unconditionally; the new workflows detect release_script: presence at runtime before deciding.
  • Delete safety uses prefix allowlist, not blocklist. The old delete-app.sh used grep -iqE '(production|staging)' to block "dangerous" names — fragile against future app naming. The new delete-app.sh requires the app name to start with REVIEW_APP_PREFIX, which is a stronger guarantee.

Potential issues

  1. cpflow exists in the deploy workflow doesn't distinguish auth errors from "not found" (see inline comment). The delete-app.sh carefully pattern-matches error output to surface API/auth failures, but the deploy workflow's check-app step treats all non-zero exits as exists=false. An intermittent auth error on a push event would silently skip the deploy.

  2. staging deploy job has no needs: validate-branch: The deploy job correctly declares needs: [validate-branch, build] and has an if: guard, but if validate-branch produces no output (e.g., first run where the variable step fails), the if: could evaluate against an empty string. In practice the validation step exits non-zero, so build never runs and deploy is skipped by the needs: dependency. This is fine but the if: guard is slightly redundant with the needs: dependency on validate-branch.

  3. Help command has no contents: read permission: cpflow-help-command.yml declares only issues: write and pull-requests: write. There's no explicit contents: read but GitHub's default GITHUB_TOKEN grants read on public repos, so this works — just worth noting that it relies on the repo's default permissions.


Nits

  • cpflow-deploy-staging.yml uses GITHUB_REF_NAME (simpler), whereas the old workflow used ${GITHUB_REF#refs/heads/} (strip-prefix approach that would silently behave oddly for tag refs). Small but cleaner.
  • The shakacode-team.md link fix (-info org URL corrected to shakacode-open-source-examples-production) is a separate correctness fix bundled in cleanly.

Overall this is well-structured with meaningful security and reliability improvements. The one item I'd prioritize is the cpflow exists error handling consistency noted in the inline comment.

if cpflow exists -a "${APP_NAME}" --org "${CPLN_ORG}"; then
echo "exists=true" >> "$GITHUB_OUTPUT"
else
echo "exists=false" >> "$GITHUB_OUTPUT"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if/else around cpflow exists conflates two different failure modes: "app does not exist" (expected non-zero) and "API/auth error" (unexpected non-zero). Both land in the else branch and set exists=false.

Compare delete-app.sh in this same PR, which carefully pattern-matches the output ("Double check your org", "Unknown API token format", "ERROR", etc.) to distinguish a real failure from a genuine not-found and surface it as a hard error.

If there's a transient auth issue when a PR is pushed, this step sets exists=false and the pull_request path silently writes a step-summary saying the app doesn't exist yet — masking the real problem until the deploy is manually retried.

Suggested approach: mirror delete-app.sh's pattern — capture the output, check exit code, inspect the text for known error tokens, and exit non-zero on a real failure rather than treating it as "not found".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b9095c6 by capturing cpflow exists output and failing hard on auth/API error tokens instead of treating every non-zero exit as app-not-found.


if [[ -f "${HOME}/.ssh/cpflow_build_key" ]]; then
eval "$(ssh-agent -s)"
trap 'ssh-agent -k >/dev/null; rm -f "${HOME}/.ssh/cpflow_build_key"' EXIT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EXIT trap kills the SSH agent and removes cpflow_build_key, but the ~/.ssh/known_hosts file written in the previous step is not cleaned up here. On ephemeral GitHub-hosted runners this is harmless (the VM is discarded), but if this action is ever adopted on self-hosted runners with a persistent ~/.ssh, the custom known_hosts entries would accumulate across runs.

Consider adding rm -f "${HOME}/.ssh/known_hosts" to the trap, or using a temp file under $RUNNER_TEMP instead of ~/.ssh for the known_hosts.

default: "3.3.1"
cpflow_version:
description: cpflow gem version
required: false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI version is pinned here as a default, which is good for reproducibility. Worth noting: since sudo npm install -g fetches from the npm registry at runtime, a compromised release of @controlplane/cli@3.3.1 would silently affect all runs. Consider adding a SHA/integrity check (e.g. via npm install --integrity) or at minimum locking to a digest-pinned download, especially since this step runs in workflows that access staging/production tokens.


current_image="${selected_image:-${first_image}}"
current_version="${selected_version:-${first_version}}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The staging env-var parity check only compares names, not values. It will catch a missing DATABASE_URL in production, but it won't catch a case where production has the right key names but wrong or stale values (e.g. a staging secret was rotated but the production counterpart wasn't updated). This is expected behaviour for a name-only check — just worth documenting explicitly so it's clear what the check does and doesn't guarantee.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented in b9095c6: the production promotion workflow checks that production has the same environment variable names as staging, but it does not compare secret values.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: Replace hand-written CI/CD Actions with generated cpflow- flow*

Well-considered overhaul that hardens security, safety, and operational reliability. No blocking issues found.

Security improvements

Token no longer passed via --token flag — old setup-environment exposed it in /proc/cmdline and ps; now uses CPLN_TOKEN env var only. SSH key scoped to a dedicated step so ACTIONS_STEP_DEBUG cannot leak it from the build step. Delete command now gated on author_association (old workflow had no permission check). Fork PR protection blocks syncs before any secret is consumed. Old contains() command matching replaced with exact match.

Safety improvements

cancel-in-progress changed from true to false everywhere — old staging deploy risked partial GVC deployments mid-rollout. Prefix-based delete guard replaces the old substring check for production/staging names. Production rollback logic added with per-workload state capture and restore — old workflow had no rollback at all. Health checks now configurable (HEALTH_CHECK_RETRIES, HEALTH_CHECK_INTERVAL, HEALTH_CHECK_ACCEPTED_STATUSES).

Issue 1: Fork-PR /deploy-review-app comment silently fails with no PR feedback

When a collaborator comments /deploy-review-app on a fork PR, the job condition passes but Validate review app deployment source exits 1 with only a runner log message. No PR comment explains the rejection. Suggest posting an issues.createComment before exit 1, similar to how the unconfigured-secrets path writes to GITHUB_STEP_SUMMARY.

Issue 2: cpflow exists error detection is string-matching fragile

delete-app.sh uses case matching over stderr to distinguish app-not-found from auth/network errors. The inline TODO acknowledges this. If cpflow changes an error message, the pattern will not match and a real failure will be silently treated as app not found, causing delete to return success vacuously. The TODO to switch to a structured exit code is the right long-term fix.

Issue 3: gem install / npm install without integrity pinning

cpflow-setup-environment/action.yml installs @controlplane/cli@3.3.1 and cpflow 4.2.0 without hash verification. A re-tagged or compromised release would install silently. For production-adjacent CI, consider verifying SHA256 after install or using lockfiles with integrity entries.

Issue 4: Nightly cleanup has no failure notification

cpflow-cleanup-stale-review-apps.yml runs on cron with no failure notification. Silent failures accumulate stale apps over days. Consider a step gated on if: failure() that posts a GitHub issue or annotation.

Minor observations

APP_NAME / PR_NUMBER double-assignment is intentional (workflow-level env needed early for concurrency.group, step overwrites with resolved values) but a short comment would clarify intent. cpflow-review-app-help.yml using pull_request_target is correct and well-documented. deploy job in cpflow-deploy-staging.yml re-runs cpflow-setup-environment unnecessarily (only needs cpflow deploy-image) — minor redundancy, acceptable for simplicity.

echo "🗑️ Deleting application: $APP_NAME"
cpflow delete -a "$APP_NAME" --org "$CPLN_ORG" --yes

echo "✅ Successfully deleted application: $APP_NAME"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case pattern matching over stderr is the right approach given cpflow exists doesn't expose a structured signal, and the inline TODO is the right way to track this. One additional defensive measure worth considering: log the full exists_output to the runner even when the "not found" path is taken, so there's an audit trail if unexpected output silently passed through the filter. Right now the output is printed only if non-empty, but it's easy to miss in logs.

Also worth noting: the pattern list doesn't include "timeout", "connection refused", or "ENOTFOUND" — all plausible network failure messages. If cpflow's Ruby HTTP client times out, it may surface Errno::ECONNREFUSED or Net::ReadTimeout, which would match "Net::", so that's covered. But curl-style timeouts would not. Low risk given this is a Ruby gem, but worth keeping in mind when bumping the cpflow version.

shell: bash
run: |
set -euo pipefail
cpflow cleanup-stale-apps -a "${{ vars.REVIEW_APP_PREFIX }}" --org "${{ vars.CPLN_ORG_STAGING }}" --yes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No failure notification exists for this cron job. If cpflow cleanup-stale-apps fails (auth error, API hiccup, etc.), stale review apps accumulate silently across days.

Consider appending a step:

      - name: Notify on failure
        if: failure()
        run: |
          echo "::error::Stale review app cleanup failed. Check the workflow logs."

Or, if the team monitors GitHub issues, use gh issue create to open a tracking issue on failure so it's not lost in workflow history.

} >> "$GITHUB_STEP_SUMMARY"
exit 0
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a collaborator comments /deploy-review-app on a fork PR, execution reaches this exit 1 with only a runner log message — no feedback is posted to the PR thread. The collaborator sees a red workflow run with no explanation in the PR.

Consider posting a comment before exiting. The GH_TOKEN is available from the Resolve PR ref and commit step but not inherited here; you'd need to add GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} to this step's env block. Something like:

gh api "repos/${GITHUB_REPOSITORY}/issues/${PR_NUMBER}/comments" \
  --method POST \
  --field body="Fork PRs cannot use /deploy-review-app because this workflow uses repository secrets. Ask a maintainer to deploy from a branch in the base repository." \
  --silent || true

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e46bfe: fork PR /deploy-review-app rejections now post a PR comment before exiting, while still keeping the deployment blocked because the workflow uses repository secrets.

run: |
set -euo pipefail

sudo npm install -g @controlplane/cli@${{ inputs.cpln_cli_version }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version-pinned but not hash-verified. A malicious actor who gains write access to the @controlplane/cli npm package (or re-tags a release) could substitute a compromised binary. Same applies to gem install cpflow below.

For production-adjacent pipelines, consider adding a post-install checksum check, or at minimum document the expected SHA256 for each pinned version in a comment so drift is detectable during future version bumps.

run: |
# Best-effort rollback: try every workload, aggregate failures, exit non-zero at the end
# if any failed. A single cpln hiccup shouldn't leave other workloads mid-promotion.
set -uo pipefail
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intentional omission of -e here (using set -uo pipefail instead of set -euo pipefail) is the correct design — it lets rollback continue across all workloads even if one cpln call fails, with rollback_failures tracking the aggregate. Worth a brief comment explaining the omission so a future reader doesn't add -e thinking it was forgotten:

Suggested change
set -uo pipefail
# Intentionally no -e: we want every workload to attempt rollback even if earlier ones fail.
set -uo pipefail

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e46bfe with a comment explaining that set -e is intentionally omitted so rollback can attempt every workload and aggregate failures.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Token not persisted for subsequent workflow steps
    • The composite action now writes CPLN_TOKEN to GITHUB_ENV so subsequent cpflow and cpln workflow steps inherit authentication without passing the token on the command line.

Create PR

Or push these changes by commenting:

@cursor push f1eee564c7
Preview (f1eee564c7)
diff --git a/.github/actions/cpflow-setup-environment/action.yml b/.github/actions/cpflow-setup-environment/action.yml
--- a/.github/actions/cpflow-setup-environment/action.yml
+++ b/.github/actions/cpflow-setup-environment/action.yml
@@ -63,6 +63,12 @@
           exit 1
         fi
 
+        {
+          echo "CPLN_TOKEN<<__CPLN_TOKEN__"
+          echo "$CPLN_TOKEN"
+          echo "__CPLN_TOKEN__"
+        } >> "$GITHUB_ENV"
+
         create_output=""
         if ! create_output="$(cpln profile create default --org "$ORG" 2>&1)"; then
           if ! echo "$create_output" | grep -qi "already exists"; then

You can send follow-ups to the cloud agent here.

Comment thread .github/actions/cpflow-setup-environment/action.yml
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review — PR #732: Generated cpflow GitHub Actions Flow

Overview

This is a well-structured replacement of hand-written CI/CD workflows with the generated cpflow-* composite actions and workflows. The new flows are meaningfully safer and more correct than the old ones across several dimensions. The PR is in draft but the code quality is high enough to call out specifics.


Significant Improvements ✅

  • Secret handling: Old code passed CPLN_TOKEN via --token (visible in /proc/<pid>/cmdline and ps output). New code passes it exclusively via environment — good security hygiene across all three cpln/cpflow call sites.
  • Prefix guard on deletion: delete-app.sh now validates that the app name starts with $REVIEW_APP_PREFIX- before deleting, which is a much stronger guard than the old "reject names containing 'production' or 'staging'" heuristic. The old approach would have blocked deletion of any app with those words in the name (too broad) while leaving other non-review apps unprotected (too narrow).
  • Fork PR protection: The deploy workflow explicitly detects fork PRs and blocks them from accessing repository secrets — the old workflow had no such guard.
  • Author association gating: Slash commands now check author_association is OWNER, MEMBER, or COLLABORATOR. The old delete workflow had no collaborator check at all.
  • Non-cancelling concurrency (cancel-in-progress: false): Prevents partial deployments and half-deleted apps — this was cancel-in-progress: true in the old staging workflow.
  • Multi-workload rollback: The promote workflow rolls back all workloads independently on failure, aggregating errors rather than stopping at the first failure.
  • Env-var parity check: Pre-promotion check that production has all staging env var names prevents silent nil variable errors after promotion.
  • Health checks with configurable accepted statuses: The default 200 301 302 is well-reasoned for auth-gated Rails apps; HEALTH_CHECK_ACCEPTED_STATUSES makes it overridable without editing the file.
  • Randomized heredoc delimiter in capture-current: EOF_$(openssl rand -hex 8) correctly prevents heredoc injection if rollback state ever contained a literal EOF line.

Issues

Bug: deploy job may run even when build fails in cpflow-deploy-staging.yml

In GitHub Actions, an explicit if: on a job replaces the implicit success() guard. The deploy job condition is:

if: needs.validate-branch.outputs.is_deployable == 'true'

Because this doesn't check needs.build.result, the deploy job will run even if the build job fails (as long as the branch is deployable). This burns runner minutes and produces a confusing deploy attempt against a missing image.

Fix: add && needs.build.result == 'success' to the condition.

Risk: String-matching on cpflow exists error output is fragile

delete-app.sh (and the inline version in cpflow-deploy-review-app.yml) detect "real" errors by checking for specific strings like "Double check your org", "ERROR", "Traceback", "Net::". The file already has a # TODO comment acknowledging this, but the consequence of missing a new error pattern is silently treating a real failure as "app not found" — which could skip deletion without any alert.

This is acceptable given the stated cpflow limitation, but the heuristic string list should be kept in sync with cpflow's actual output. An alternative interim approach: treat any non-zero exit with non-empty stderr as a failure unless it matches a known "not found" pattern, rather than enumerating known error patterns.

Minor: Workflow-level APP_NAME env var is redundant and overridden in cpflow-deploy-review-app.yml

The top-level env: block in the deploy review app workflow sets APP_NAME from the raw event payload, but the Resolve PR ref and commit step later overwrites it in $GITHUB_ENV with the normalized PR number. The step-level value wins, making the workflow-level declaration dead configuration. It's also the only workflow in this set that has this pattern — the others don't duplicate APP_NAME this way.


Nitpicks

  • cpflow-setup-environment/action.yml: gem install cpflow -v ... without --no-document is slower than needed on CI runners.
  • cpflow-cleanup-stale-review-apps.yml: The Validate required secrets and variables step runs as a plain shell: bash step but has no id: — this is fine, but if you ever want to condition a downstream step on it, you'll need to add one.
  • The old nightly-remove-stale-review-apps.yml hardcoded the app name (qa-react-webpack-rails-tutorial) rather than using ${{ vars.REVIEW_APP_PREFIX }}. The new workflow correctly uses the variable — worth calling out as a subtle bug fix.

Summary

The core logic is solid and the security posture is materially improved. The one real bug (deploy running after build failure) should be fixed before merge. The string-matching fragility is an acknowledged limitation with a TODO already in place.

fi
deploy_args+=(--org "${CPLN_ORG}" --verbose)

cpflow deploy-image "${deploy_args[@]}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: deploy runs even when build fails.

When a job has an explicit if:, GitHub Actions drops the implicit success() guard. Because this condition only checks validate-branch's output, the deploy job will start even if the build job failed, wasting runner minutes and attempting to deploy a missing image.

Suggested change
cpflow deploy-image "${deploy_args[@]}"
if: needs.validate-branch.outputs.is_deployable == 'true' && needs.build.result == 'success'

echo "🗑️ Deleting application: $APP_NAME"
cpflow delete -a "$APP_NAME" --org "$CPLN_ORG" --yes

echo "✅ Successfully deleted application: $APP_NAME"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string-matching heuristic on cpflow exists error output (lines 43–54) is the main reliability risk in this file. The consequence of an unmatched error pattern is silently exiting 0 ("app not found") when the API actually returned an error, meaning the delete is skipped without any visible failure.

An alternative that inverts the logic: treat any non-zero exit with non-empty output as a failure unless it matches a known "not found" pattern:

# instead of matching known error strings, match known "not found" patterns
case "$exists_output" in
  *"not found"*|*"does not exist"*)
    echo "⚠️ Application does not exist: $APP_NAME"
    exit 0
    ;;
  *)
    echo "❌ ERROR: failed to determine whether application exists" >&2
    printf '%s\n' "$exists_output" >&2
    exit 1
    ;;
esac

This is safer by default — unknown output is treated as an error rather than a no-op. The tradeoff is that you'd need to know the exact "not found" wording from cpflow exists, which has the same coupling problem in the other direction. The TODO comment already captures this; just flagging it for awareness.

echo "ready=false" >> "$GITHUB_OUTPUT"
{
echo "Control Plane review app automation is not configured yet."
echo
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This top-level APP_NAME is computed from the raw event payload and is later overwritten in $GITHUB_ENV by the Resolve PR ref and commit step (echo "APP_NAME=..." >> "$GITHUB_ENV"). The step-level write wins, so this declaration is dead configuration and can be confusing. Consider removing it and relying solely on the step-computed value, which is already the authoritative one (it handles all three event types correctly).

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: cpflow Generated GitHub Actions Flow

This is a meaningful security and reliability upgrade over the handwritten workflows. Several issues from the old setup are fixed here; I've also flagged a few new items worth addressing.


✅ Notable Improvements

  • Token passed via env, not --token flag — prevents leakage in /proc/<pid>/cmdline and ps output (old setup-environment used cpln profile update default --org "$ORG" --token "$TOKEN")
  • Author association gating on /deploy-review-app and /delete-review-app — the old delete workflow accepted the command from any commenter
  • Fork PR protection in the deploy workflow — old workflow ran on all pull_request events without a fork check, allowing fork PRs to attempt using repo secrets
  • Prefix guard in delete-app.sh — prevents accidental deletion of non-review-app resources
  • SSH key isolated to a dedicated step — keeps it out of the main build step's env where ACTIONS_STEP_DEBUG=true would expose it
  • Randomized heredoc delimiters in the promotion workflow — prevents content injection if rollback state contains EOF
  • Old staging workflow fixed — the old deploy-to-control-plane-staging.yml triggered on branches: ['*'] (every push), burning runners on every feature branch; new workflow correctly filters to main/master
  • cancel-in-progress: false on all deployment concurrency groups — prevents partial rollout state from mid-cancelled deploys

🔴 Issues

1. cpflow-delete-review-app.yml uses pull_request_target without an explanation comment

The delete workflow triggers on pull_request_target: [closed]. This event runs with write permissions in the base repo context and can post PR comments on fork PRs. It's the correct choice here (since no PR code is checked out), but it's a well-known footgun — future maintainers may switch it to pull_request thinking it's "safer" and break fork PR cleanup. cpflow-review-app-help.yml has a good comment explaining this; the delete workflow should have one too.

2. cpflow exists error-pattern matching is duplicated

The same fragile string-matching logic ("Double check your org", "ERROR", "Traceback", "Net::", etc.) appears both in delete-app.sh and inline in the Check if review app exists step of cpflow-deploy-review-app.yml. The delete-app.sh has a TODO explaining the fragility; the workflow copy does not. If cpflow changes its error output, both need updating in sync. Consider either extracting this into a shared shell script, or at minimum adding the same TODO comment in the workflow.


🟡 Warnings

3. CPLN_TOKEN is written to GITHUB_ENV and persists to all subsequent steps

In cpflow-setup-environment/action.yml (lines 67–72), the token is written to GITHUB_ENV so later workflow steps can authenticate. This is necessary for cpflow to work, but it means the token is present in the environment of every subsequent step — including any step that might env | sort or log environment state. Consider documenting this tradeoff in a comment.

4. Production health check accepts 301/302 by default

HEALTH_CHECK_ACCEPTED_STATUSES defaults to '200 301 302'. A redirect to a login page or a broken URL returns a redirect, so the health check passes even if the app is misconfigured post-deploy. The docs note the recommended mitigation (set a dedicated /health endpoint and override to "200"). Consider making the default stricter, or adding a warning to the step summary when a redirect is accepted.

5. Nightly cleanup has no failure notification

cpflow-cleanup-stale-review-apps.yml runs at midnight UTC. If it fails (bad token, cpflow error, etc.), there is no alert — stale apps silently accumulate cost. Consider adding a Slack/email notification on failure, or at minimum ensure the repo has GitHub workflow failure notifications enabled for scheduled runs.


ℹ️ Informational

6. /help command has no author_association check

cpflow-help-command.yml responds to /help from any commenter (no association guard). This is probably intentional — help is informational — but on a high-volume repo the bot will reply to every /help comment from any user. Noting for awareness.

7. ruby/setup-ruby@v1 and actions/github-script@v7 use floating major versions

These will pick up any non-breaking updates automatically, which is generally fine. However, actions/checkout@v4 is also a floating major — all three could be pinned to a specific SHA for a more locked-down supply chain posture if that's a concern.

name: Delete Review App

on:
pull_request_target:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull_request_target deserves an explanation comment here.

pull_request_target runs in the context of the base repo (write permissions, access to secrets) and is required for fork PR cleanup — pull_request can't post comments on fork PRs because the token is read-only. This is safe here because no PR code is ever checked out (the actions/checkout@v4 below fetches base-branch code only).

cpflow-review-app-help.yml has a good comment explaining this exact pattern. Without a parallel comment here, a future maintainer may "fix" this to pull_request thinking it's safer, silently breaking fork PR cleanup.

Suggested change
pull_request_target:
# pull_request_target is intentional: it has write permission to comment on PRs from
# forks, where `pull_request` would be read-only. This is safe because no PR code is
# checked out — the job only calls cpflow with base-repo credentials.
# Do not switch this to `pull_request` or add a checkout of the PR ref without re-evaluating.
pull_request_target:

Comment on lines +66 to +72
# Make the token available to later workflow steps without putting it on argv.
token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
{
echo "CPLN_TOKEN<<${token_delimiter}"
printf '%s\n' "$CPLN_TOKEN"
echo "${token_delimiter}"
} >> "$GITHUB_ENV"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing CPLN_TOKEN to GITHUB_ENV propagates it to every subsequent step in the calling workflow — not just the steps that explicitly use it. This is a necessary tradeoff (cpflow picks it up from the environment automatically), but it's worth documenting so future maintainers don't inadvertently add steps that log environment state.

Suggested change
# Make the token available to later workflow steps without putting it on argv.
token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
{
echo "CPLN_TOKEN<<${token_delimiter}"
printf '%s\n' "$CPLN_TOKEN"
echo "${token_delimiter}"
} >> "$GITHUB_ENV"
# Persist CPLN_TOKEN to GITHUB_ENV so all subsequent steps can authenticate
# with cpflow/cpln without putting the token on the command line.
# Trade-off: the token is present in every subsequent step's environment.
# Do not add steps that dump env vars (e.g., `env | sort`) after this point.
token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
{
echo "CPLN_TOKEN<<${token_delimiter}"
printf '%s\n' "$CPLN_TOKEN"
echo "${token_delimiter}"
} >> "$GITHUB_ENV"

Comment on lines +28 to +35
# TODO: replace this string-matching with a structured signal once `cpflow exists` exposes one
# (e.g. a distinct exit code for "not found" vs. API/auth errors, or `cpflow exists --json`).
# Until then, keep this list in sync if `cpflow exists` starts emitting new error patterns —
# any unmatched error string would otherwise be silently treated as "app not found".
exists_output=""
if ! exists_output="$(cpflow exists -a "$APP_NAME" --org "$CPLN_ORG" 2>&1)"; then
case "$exists_output" in
*"Double check your org"*|*"Unknown API token format"*|*"ERROR"*|*"Error:"*|*"Traceback"*|*"Net::"*)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same cpflow exists error-pattern matching appears verbatim in the Check if review app exists step in cpflow-deploy-review-app.yml (line ~195). The TODO comment here is accurate but only covers this file — the workflow copy is uncovered.

If cpflow adds a new error pattern (e.g. "rate limit" or "timeout"), both locations need updating in sync. Consider extracting this into a shared script (e.g. .github/scripts/cpflow-app-exists.sh) that both the action and the workflow call, so the pattern list is maintained in one place.

# repo variable to tighten this for apps that expose a dedicated health endpoint
# (e.g. "200" for a plain /health, or "200 401 403" for apps that auth-gate / without
# redirecting).
HEALTH_CHECK_ACCEPTED_STATUSES: ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting 301/302 as "healthy" means a redirect to a login page, maintenance page, or even a misconfigured CDN passes the health check. The comment correctly explains why (curl without -L, Rails auth-gating /), but a redirect can pass even when the destination is broken.

Consider adding a step summary warning when a non-200 status is accepted, so operators reviewing the promotion log have visibility:

if [[ "${http_status}" == "${accepted}" ]]; then
  if [[ "${http_status}" != "200" ]]; then
    echo "::warning::Health check passed with HTTP ${http_status} (not 200) — verify the redirect destination is healthy."
  fi
  echo "healthy=true" >> "$GITHUB_OUTPUT"
  exit 0
fi

Also worth documenting in .controlplane/shakacode-team.md that setting HEALTH_CHECK_ACCEPTED_STATUSES=200 and adding a /health endpoint is the recommended production posture.

on:
workflow_dispatch:
schedule:
- cron: "0 0 * * *"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduled workflows that fail silently are easy to miss — GitHub does send an email for failed scheduled runs but only to the repo owner by default, and only after 7 days of consecutive failures (as of current GitHub behavior).

Consider adding a failure notification step, or at minimum documenting that the team should verify GitHub's "Notify me of failed workflow runs" notification setting is enabled. Stale review apps that aren't cleaned up accrue Control Plane costs.

Suggested change
- cron: "0 0 * * *"
- cron: "0 3 * * *" # 3 AM UTC — avoids exact midnight when many repos run scheduled jobs simultaneously

(The cron timing is a minor suggestion — staggering off midnight reduces GitHub's scheduler contention, making the run more likely to start on time.)

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: Replace hand-written CI/CD with generated cpflow-* GitHub Actions flow

This PR replaces a set of hand-crafted GitHub Actions workflows with the generated cpflow-* composite action and workflow set. Overall the migration is a meaningful improvement in security, robustness, and maintainability. Below is a detailed assessment.


What Changed

Old New
setup-environment action cpflow-setup-environment action
build-docker-image action cpflow-build-docker-image action
delete-control-plane-app action (no prefix guard) cpflow-delete-control-plane-app action (prefix-guarded)
deploy-to-control-plane-review-app.yml cpflow-deploy-review-app.yml
delete-review-app.yml (pull_request closed) cpflow-delete-review-app.yml (pull_request_target closed)
deploy-to-control-plane-staging.yml (triggered on all branches) cpflow-deploy-staging.yml (main/master only)
nightly-remove-stale-review-apps.yml cpflow-cleanup-stale-review-apps.yml
promote-staging-to-production.yml cpflow-promote-staging-to-production.yml
help-command.yml cpflow-help-command.yml + cpflow-review-app-help.yml

Security Improvements ✅

  • Token never on argv: cpflow-setup-environment passes CPLN_TOKEN via environment variable rather than --token on the command line, avoiding /proc/<pid>/cmdline and ps leakage. The old setup-environment used cpln profile update default --org "$ORG" --token "$TOKEN".
  • SSH key isolation: cpflow-build-docker-image dedicates a separate step for writing the SSH key so DOCKER_BUILD_SSH_KEY is not present in the main build step's environment, even under ACTIONS_STEP_DEBUG=true.
  • Prefix guard on deletes: cpflow-delete-control-plane-app/delete-app.sh checks that APP_NAME starts with REVIEW_APP_PREFIX- before calling cpflow delete. The old action had no such guard.
  • Fork PR gating: cpflow-deploy-review-app.yml explicitly blocks fork PRs on pull_request events (cannot access secrets anyway) and adds author_association checks for comment-triggered deploys. The old workflow triggered on any PR synchronize/reopened without fork checks.
  • pull_request_target used correctly: cpflow-delete-review-app.yml uses pull_request_target for the closed-PR cleanup path (to access secrets from fork PRs) and checks out only the base repo — the comment in the file explains this. This is a well-known footgun that has been handled correctly here.

Robustness Improvements ✅

  • cancel-in-progress: false on all deploy/delete concurrency groups. Mid-rollout cancellation of cpflow deploy-image can leave workloads in a partially-deployed state; letting in-flight runs finish is the right call.
  • Multi-workload rollback: The production promotion workflow now captures per-container image state for all workloads and restores each one on failure, rather than only rolling back the single rails workload image.
  • Randomized heredoc delimiters in cpflow-promote-staging-to-production.yml prevent a stray EOF in a variable value from truncating the multi-line GITHUB_OUTPUT write.
  • --run-release-phase is now opt-in: cpflow-deploy-review-app.yml and cpflow-deploy-staging.yml inspect cpflow config to decide whether to pass --run-release-phase, rather than always passing it. This avoids failures when the flag isn't configured.
  • Health check window is larger: 24 retries × 25s = ~10 min (old: 12 × 10s = ~2 min). More headroom for slow Rails cold boots.

Issues and Suggestions

Medium: Stale cpflow exists error-matching is fragile

delete-app.sh and cpflow-deploy-review-app.yml both parse free-form stderr output from cpflow exists to distinguish "app not found" from authentication/API errors. The comment in the file acknowledges this with a TODO. This approach is inherently brittle if cpflow changes its error output. Consider filing an issue against cpflow to request a structured signal (distinct exit code or --json output) and linking it from the TODO so it doesn't get forgotten.

Low: cpflow-deploy-review-app.ymlsteps.source.outputs.allowed can be empty on workflow_dispatch

The Validate review app deployment source step sets allowed=true for same-repo branches and allowed=false for fork PRs via pull_request events. For workflow_dispatch, steps.resolve-pr.outputs.same_repo could be "false" if the PR is from a fork, which falls into the issue_comment branch of the condition and calls gh pr comment — but workflow_dispatch on a fork PR should perhaps just fail rather than try to comment. Low risk since workflow_dispatch is manual, but the code path is slightly inconsistent.

Low: cpflow-setup-environment — redundant null checks

The Setup Control Plane profile and registry login step validates that CPLN_TOKEN and ORG are non-empty via shell if [[ -z ... ]] guards. These variables are both declared required: true in the action's inputs: block, so Actions will already refuse to call the action without them. The guards add no safety but don't hurt anything.

Low: Old staging workflow triggered on branches: ['*']

The deleted deploy-to-control-plane-staging.yml ran on every push to every branch and relied on a validate-branch job to short-circuit. The new cpflow-deploy-staging.yml correctly restricts the on.push.branches filter to ["main", "master"] with a comment explaining the GitHub limitation on vars in branch filters. This is an improvement, but it means pushes to a non-main/master STAGING_APP_BRANCH will not trigger the workflow unless the file is edited. The comment in the file explains this trade-off clearly — no action needed, just confirming the approach is understood.

Nit: cpflow-cleanup-stale-review-apps.yml — no contents: write permission for release creation

The promote workflow needs contents: write to create GitHub releases; the cleanup workflow does not. Both look correct on permissions.

Nit: cpflow-deploy-review-app.yml line 1011 — long if: conditions

Several steps have near-identical if: conditions like:

if: steps.config.outputs.ready == 'true' && steps.source.outputs.allowed == 'true' && (steps.check-app.outputs.exists == 'true' || steps.setup-review-app.outcome == 'success')

This is repetitive across 10+ steps. A job-level output or an environment variable set once would be cleaner, but this is a style point and not a functional issue in generated code.


Documentation ✅

  • .controlplane/readme.md and shakacode-team.md are updated to reference the new workflow names, slash commands, and REVIEW_APP_PREFIX naming convention.
  • Required/optional repository secrets and variables are now fully documented in shakacode-team.md.
  • The pull_request_target usage in cpflow-delete-review-app.yml and cpflow-review-app-help.yml is explained with inline comments that document why the trigger was chosen, which is important for future maintainers.

Summary

The new generated flow is substantially more secure and robust than what it replaces. The main open item is the cpflow exists output-parsing fragility, which the code already acknowledges. The PR is in good shape to merge once the draft status is resolved and tests can be run.

exists_output=""
if ! exists_output="$(cpflow exists -a "$APP_NAME" --org "$CPLN_ORG" 2>&1)"; then
case "$exists_output" in
*"Double check your org"*|*"Unknown API token format"*|*"ERROR"*|*"Error:"*|*"Traceback"*|*"Net::"*)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment here is well-placed. To make this easier to track, consider filing an issue against the cpflow gem linking to this file, so the workaround has a visible upstream ticket. The risk of silent misclassification (an unrecognized error string treated as "app not found") is the main hazard — a regression test that mocks cpflow exists returning an exit-1 with a novel error string would catch drift before it reaches production.

echo "Failed to post fork rejection comment to PR #${PR_NUMBER}" >&2
fi

echo "${message}" >&2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the trigger is workflow_dispatch and the PR being deployed is from a fork (same_repo == "false"), neither the pull_request branch (line 143) nor the same_repo == "true" branch (line 138) matches, so execution falls through to this gh pr comment call followed by exit 1. That is technically correct — the deployment is blocked — but it reaches the gh pr comment path that was written for the issue_comment case, which may be surprising. A small guard makes the intent explicit:

Suggested change
echo "${message}" >&2
# workflow_dispatch with a fork PR: block the run without attempting a PR comment.
if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
echo "${message}" >&2
exit 1
fi
message="Review app deploys from fork pull requests are not allowed because this workflow uses repository secrets."
if ! gh pr comment "${PR_NUMBER}" --body "${message}"; then
echo "Failed to post fork rejection comment to PR #${PR_NUMBER}" >&2
fi
echo "${message}" >&2

run: |
set -euo pipefail

if [[ -z "$CPLN_TOKEN" ]]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both token and org are declared required: true in inputs:, so Actions will already refuse to invoke this composite action without them. These null checks will never fire. They're harmless, but if left in they may mislead a future reader into thinking there's a code path where these are empty.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

PR #732 Review: cpflow Generated GitHub Actions Flow

Overview

This PR replaces hand-written Control Plane GitHub Actions with the output of cpflow generate-github-actions, streamlining CI/CD for review apps, staging, and production. It is a substantial improvement in security, correctness, and maintainability.


Security Improvements (Notable)

  • Token exposure fixed: The old setup-environment passed --token "$TOKEN" directly to cpln profile update default, exposing the token in /proc/<pid>/cmdline and ps output. The new action correctly passes it via CPLN_TOKEN env var.
  • SSH key isolation: cpflow-build-docker-image keeps the SSH private key in a dedicated step so it never appears in the build step's environment — mitigating ACTIONS_STEP_DEBUG exposure.
  • Fork PR protection hardened: The old deploy workflow allowed any comment containing /deploy-review-app (using contains()), with no author_association check. The new workflow requires an exact body match (==) and gates on OWNER, MEMBER, or COLLABORATOR.
  • Delete safety guard added: delete-app.sh validates the app name against REVIEW_APP_PREFIX before calling cpflow delete, preventing accidental deletion of non-review apps. The old script only checked for "production" or "staging" in the name.
  • Old nightly workflow bug fixed: The old nightly cleanup hardcoded the prefix qa-react-webpack-rails-tutorial instead of reading from vars.REVIEW_APP_PREFIX.

Correctness Improvements

  • Multi-workload rollback: Production promotion now captures and rolls back all app workloads (not just containers[0].image of the rails workload), making rollback safer for apps with sidecars or workers.
  • Env-var parity check before promotion: Compares GVC env var names between staging and production before promoting — prevents deploying with missing config.
  • Proper pull_request_target for review-app-help: The old help workflow used pull_request, which has no write access for fork PRs. The new version correctly uses pull_request_target without a code checkout.
  • cancel-in-progress: false on all deployment concurrency groups: Prevents partially-deployed states caused by mid-flight cancellations.
  • Staging branch filter fixed: The old staging workflow triggered on branches: ['*'] (every branch). The new workflow correctly gates on main/master with a configurable STAGING_APP_BRANCH.

Issues Found

Misleading comment: HEALTH_CHECK_RETRIES/INTERVAL are not actually overridable via repo vars

In cpflow-promote-staging-to-production.yml lines 15–20, the comment says:

"Override these by editing this file or by setting the matching repository variable."

But HEALTH_CHECK_RETRIES: 24 and HEALTH_CHECK_INTERVAL: 15 are hardcoded literals. Unlike HEALTH_CHECK_ACCEPTED_STATUSES (line 28) which reads ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}, the retries/interval values ignore any repo variable of the same name. Either change them to ${{ vars.HEALTH_CHECK_RETRIES || 24 }} / ${{ vars.HEALTH_CHECK_INTERVAL || 15 }}, or remove the repo-variable claim from the comment. As-is, anyone who sets HEALTH_CHECK_RETRIES as a repo variable will see no effect and waste time debugging.

Duplicated cpflow exists error-string detection

The fragile output-string matching for distinguishing "app not found" vs. API/auth errors is copy-pasted in:

  • cpflow-delete-control-plane-app/delete-app.sh (lines 34–44)
  • cpflow-deploy-review-app.yml (lines 204–213)

delete-app.sh already has a TODO acknowledging this. If the list of error tokens ever needs to change, both places need updating simultaneously. Consider extracting to a shared helper script (e.g., .github/actions/cpflow-delete-control-plane-app/cpflow-exists.sh) that both callers source.


Minor Notes

  • cpflow-deploy-staging.yml re-installs the Ruby toolchain and cpflow gem in both the build and deploy jobs (separate runners). Functionally correct since the Docker image is pushed to the registry between jobs, but you pay two gem-install times per staging deploy. Gem caching or a shared setup artifact could help if build times grow.
  • The hardcoded GitHub.com SSH host keys in cpflow-build-docker-image/action.yml are stable but will need updating if GitHub ever rotates them. The fallback to DOCKER_BUILD_SSH_KNOWN_HOSTS for non-GitHub hosts is a good escape hatch.
  • cpflow-setup-environment writes CPLN_TOKEN to $GITHUB_ENV, which is intentional (documented in the code), but it gives the token to every subsequent step in the job — broader scope than strictly necessary. GitHub's secret masking mitigates log exposure.
  • Version pins (@controlplane/cli@3.3.1, cpflow 4.2.0) are hardcoded rather than managed by Renovate/Dependabot. The new docs section on regeneration serves as a reminder to keep these current.

Verdict

The security and correctness improvements here are significant — particularly the token exposure fix, the fork PR hardening, and the multi-workload rollback. The one item worth addressing before merge is the misleading HEALTH_CHECK_RETRIES/INTERVAL comment, since it will mislead the next person trying to tune those values.

Comment on lines +15 to +20
# Override these by editing this file or by setting the matching repository variable.
# Worst-case wall time per attempt is HEALTH_CHECK_INTERVAL plus the curl --max-time below
# (10s), so the defaults give a ~10 minute window (24 × (15 + 10) = 600s) — enough for
# most Rails cold boots (asset precompile + db:migrate + workload readiness).
HEALTH_CHECK_RETRIES: 24
HEALTH_CHECK_INTERVAL: 15
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says these can be overridden "by setting the matching repository variable", but unlike HEALTH_CHECK_ACCEPTED_STATUSES (line 28 which reads ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}), these two are hardcoded literals — setting a repo variable with the same name has no effect.

Either wire them up to match the pattern used for HEALTH_CHECK_ACCEPTED_STATUSES:

Suggested change
# Override these by editing this file or by setting the matching repository variable.
# Worst-case wall time per attempt is HEALTH_CHECK_INTERVAL plus the curl --max-time below
# (10s), so the defaults give a ~10 minute window (24 × (15 + 10) = 600s) — enough for
# most Rails cold boots (asset precompile + db:migrate + workload readiness).
HEALTH_CHECK_RETRIES: 24
HEALTH_CHECK_INTERVAL: 15
HEALTH_CHECK_RETRIES: ${{ vars.HEALTH_CHECK_RETRIES || 24 }}
HEALTH_CHECK_INTERVAL: ${{ vars.HEALTH_CHECK_INTERVAL || 15 }}

…or remove the "setting the matching repository variable" claim from the comment above so the next person who tries to tune these doesn't spend time debugging why their repo var is ignored.

Comment on lines +30 to +35
# Until then, keep this list in sync if `cpflow exists` starts emitting new error patterns —
# any unmatched error string would otherwise be silently treated as "app not found".
exists_output=""
if ! exists_output="$(cpflow exists -a "$APP_NAME" --org "$CPLN_ORG" 2>&1)"; then
case "$exists_output" in
*"Double check your org"*|*"Unknown API token format"*|*"ERROR"*|*"Error:"*|*"Traceback"*|*"Net::"*)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same error-token list and detection logic is duplicated verbatim in cpflow-deploy-review-app.yml (the "Check if review app exists" step at line ~204). When this list needs to grow (e.g., a new cpflow exists error pattern), both files need to change in sync — which the TODO already flags.

A practical option: extract the detection to a small shared shell script (e.g., .github/actions/cpflow-delete-control-plane-app/cpflow-app-exists.sh) that returns 0/1/2 for exists/not-found/error, then source or call it from both places. That makes the error-token list a single source of truth and keeps the TODO resolution contained to one file.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Unnecessary CI work on every new PR open
    • Removed the opened pull_request trigger so the deploy workflow only runs on synchronize and reopened events while PR-open help remains handled by the help workflow.

Create PR

Or push these changes by commenting:

@cursor push 2c303a005e
Preview (2c303a005e)
diff --git a/.github/workflows/cpflow-deploy-review-app.yml b/.github/workflows/cpflow-deploy-review-app.yml
--- a/.github/workflows/cpflow-deploy-review-app.yml
+++ b/.github/workflows/cpflow-deploy-review-app.yml
@@ -4,7 +4,7 @@
 
 on:
   pull_request:
-    types: [opened, synchronize, reopened]
+    types: [synchronize, reopened]
   issue_comment:
     types: [created]
   workflow_dispatch:
@@ -35,7 +35,7 @@
 
 jobs:
   deploy:
-    # Skip synchronize/opened events from fork PRs at the job level — they cannot access
+    # Skip pull_request events from fork PRs at the job level — they cannot access
     # repository secrets anyway, so running any steps just burns billable minutes. Users
     # can still manually deploy a fork PR via `/deploy-review-app` (gated below by
     # author_association) or workflow_dispatch.

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 9a390c1. Configure here.


on:
pull_request:
types: [opened, synchronize, reopened]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary CI work on every new PR open

Low Severity

Adding opened to the pull_request types (the old workflow only had synchronize, reopened) causes the full workflow to run on every new PR — including Ruby setup, npm install of the CLI, and gem install — only to determine the review app doesn't exist yet and write a step summary. That summary is only visible in the Actions tab, while the separate cpflow-review-app-help.yml already posts a help comment directly on the PR. A brand-new PR can never have an existing review app, so opened adds several minutes of billable CI time per PR with no deployment benefit.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9a390c1. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant