Skip to content

[WIP] Introduce the ability to provision SSCSI roles on hubs and spokes when needed#119

Open
mhjacks wants to merge 29 commits into
validatedpatterns:mainfrom
mhjacks:feature/sscsi-vp-proxy-cluster-ca-chart
Open

[WIP] Introduce the ability to provision SSCSI roles on hubs and spokes when needed#119
mhjacks wants to merge 29 commits into
validatedpatterns:mainfrom
mhjacks:feature/sscsi-vp-proxy-cluster-ca-chart

Conversation

@mhjacks
Copy link
Copy Markdown
Collaborator

@mhjacks mhjacks commented May 5, 2026

Add mechanism to cluster_utils to create kubernetes auth for SS-CSI after the manner of ESO. CA trusts are expected to be provided separately

Martin Jackson added 26 commits April 28, 2026 14:31
- Read ssCsiWorkloadAuth from values-<clustergroup>.yaml applications
- Hub roles auth/hub/role/hub-sscsi-*; spoke roles per cluster vault_path
- New tasks: workload auth collection, spoke role loop; defaults for TTL and paths
- Legacy vault_csi_kubernetes_auth supported via synthetic hub row
- Include from vault_secrets_init and vault_spokes_init

Made-with: Cursor
- Default pattern_dir from PATTERN_DIR when unset (vault.yml had no pattern_settings).
- Alias main_clustergroupname from main_clustergroup after pattern_settings.
- Run pattern_settings before vault_utils in vault.yml so hub values file can load.
- Emit a single debug line with values path, app count, ssCsiWorkloadAuth identity count,
  and hub role count so operators can confirm SSCSI Vault auth wiring.

Made-with: Cursor
Parse clusterGroup.managedClusterGroups alongside applications from the
hub values file. For each group with a mapping applications.*.ssCsiWorkloadAuth,
reuse the same collection logic with cluster defaulting to group name
(managedClusterGroup.name, else YAML key) so spoke Vault roles match ACM.

Pass explicit hub default for clusterGroup.applications; thread default
through collect_one_entry for inner_item.cluster.

Made-with: Cursor
vault-only plays (e.g. collection vault.yml with only vault_utils) never set
pattern_dir or main_clustergroup, so ssCsiWorkloadAuth discovery saw an empty
values path. Include pattern_settings resolve_overrides and load
main.clusterGroupName from values-global when main_clustergroup is unset,
matching load_secrets / full vault play behavior.

Made-with: Cursor
Restore inline hub k8s_exec (apply_one task file was missing). When ssCsiWorkloadAuth
entry sets roleSlug, use it as the vault role suffix; otherwise keep SHA1 hash.

Spoke rows use the same rule so chart stable slugs can match Ansible.

Made-with: Cursor
…sscsi workload auth elements from managed clustergroups
@mhjacks mhjacks changed the title [WIP] Feature/sscsi vp proxy cluster ca chart [WIP] Introduce the ability to provision SSCSI roles on hubs and spokes when needed May 11, 2026
@mhjacks mhjacks changed the title [WIP] Introduce the ability to provision SSCSI roles on hubs and spokes when needed Introduce the ability to provision SSCSI roles on hubs and spokes when needed May 11, 2026
@mhjacks mhjacks requested a review from dminnear-rh May 11, 2026 20:05
Copy link
Copy Markdown
Contributor

@dminnear-rh dminnear-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as far as I can tell, might be worth waiting an letting Michele or somebody more familiar with the secrets roles take a look as well before merging. I didn't see anything that looked like a breaking change but I'm definitely not the most knowledgeable

@mhjacks mhjacks requested a review from mbaldessari May 11, 2026 20:32
@mhjacks
Copy link
Copy Markdown
Collaborator Author

mhjacks commented May 11, 2026

Thanks - the whole point is that the SSCSI stuff is additive, and is designed not to interfere with any of the existing secrets flows. But there's a lot and it makes sense to be careful with it. I'll ask for Michele's review as well.

@mhjacks
Copy link
Copy Markdown
Collaborator Author

mhjacks commented May 11, 2026

(Side note: I'm thinking of adding a similar mechanism for creating an AAP-specific role in preference to the current aap-config mechanism, if this approach is deemed good enough. I'll document that too. I'm also planning on a further follow-up PR to clustergroup to document the use of the CSI elements, and plug some legacy holes)

Copy link
Copy Markdown
Contributor

@mbaldessari mbaldessari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an initial quick pass really as all of these are really big. Tomorrow I'll try to deploy your mcg branch + clustergroup changes + cluster_utils and will have proper feedback I hope

Comment thread roles/vault_utils/tasks/vault_spokes_init.yaml Outdated
Comment thread Makefile
Comment thread roles/clustergroup_discovery/tasks/main.yml
@mhjacks mhjacks force-pushed the feature/sscsi-vp-proxy-cluster-ca-chart branch from cf1203c to 74657c3 Compare May 12, 2026 22:52
@mhjacks
Copy link
Copy Markdown
Collaborator Author

mhjacks commented May 12, 2026

Note: The force push was to undo an inadvertent push I when I merged and pushed my other PR to the wrong branch.

@mbaldessari
Copy link
Copy Markdown
Contributor

So using the mcg branch (https://github.com/mbaldessari/multicloud-gitops/tree/marty-sscsi) I get this:
image

Maybe something related to my setup?

@mhjacks
Copy link
Copy Markdown
Collaborator Author

mhjacks commented May 13, 2026

Try removing the deployment and retrying. Most likely you're working with the configmap from before it had the right CA injected. I'm trying to avoid sync waves (since the CM entry will always exist, it's a tricky dependency to handle).

@mhjacks
Copy link
Copy Markdown
Collaborator Author

mhjacks commented May 13, 2026

Following further research, the configmap timing problem is relatively gnarly. It can't be trivially solved with either init containers (the mount is missing, so the pod doesn't start, and the check in the init container doesn't run) or with failing the deployment somehow (since that leaves the deployment degraded, but that does not trigger a resync or remount of the config map). I don't see this issue in AEG because the bootstrap config job is already gated by sync waves on other things that need the CA material. There is a more general solution to the problem which is the Reloader operator: https://github.com/stakater/Reloader. I can remove a lot of silliness from the existing solution by using that; but we should probably talk about whether we want to provide framework-level support for that operator.

@mhjacks
Copy link
Copy Markdown
Collaborator Author

mhjacks commented May 13, 2026

@mbaldessari mhjacks/multicloud-gitops@9049239 introduces a cronjob to the config-demo app that restarts the deployment when degraded (which usually happens because of the x509 issue as discussed above). It's not especially elegant, but it should be effective. If you pull it into your fork, it should work.

@mhjacks mhjacks changed the title Introduce the ability to provision SSCSI roles on hubs and spokes when needed [WIP] Introduce the ability to provision SSCSI roles on hubs and spokes when needed May 13, 2026
@mhjacks
Copy link
Copy Markdown
Collaborator Author

mhjacks commented May 13, 2026

While troubleshooting this, I discovered another issue, with potentially deploying mulitple SPCs in a single (argo) Application. I'm going to work on that a bit i nthe meantime, so moving back to WIP. Thanks for the feedback so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants