[WIP] Introduce the ability to provision SSCSI roles on hubs and spokes when needed#119
[WIP] Introduce the ability to provision SSCSI roles on hubs and spokes when needed#119mhjacks wants to merge 29 commits into
Conversation
- Read ssCsiWorkloadAuth from values-<clustergroup>.yaml applications - Hub roles auth/hub/role/hub-sscsi-*; spoke roles per cluster vault_path - New tasks: workload auth collection, spoke role loop; defaults for TTL and paths - Legacy vault_csi_kubernetes_auth supported via synthetic hub row - Include from vault_secrets_init and vault_spokes_init Made-with: Cursor
- Default pattern_dir from PATTERN_DIR when unset (vault.yml had no pattern_settings). - Alias main_clustergroupname from main_clustergroup after pattern_settings. - Run pattern_settings before vault_utils in vault.yml so hub values file can load. - Emit a single debug line with values path, app count, ssCsiWorkloadAuth identity count, and hub role count so operators can confirm SSCSI Vault auth wiring. Made-with: Cursor
Parse clusterGroup.managedClusterGroups alongside applications from the hub values file. For each group with a mapping applications.*.ssCsiWorkloadAuth, reuse the same collection logic with cluster defaulting to group name (managedClusterGroup.name, else YAML key) so spoke Vault roles match ACM. Pass explicit hub default for clusterGroup.applications; thread default through collect_one_entry for inner_item.cluster. Made-with: Cursor
vault-only plays (e.g. collection vault.yml with only vault_utils) never set pattern_dir or main_clustergroup, so ssCsiWorkloadAuth discovery saw an empty values path. Include pattern_settings resolve_overrides and load main.clusterGroupName from values-global when main_clustergroup is unset, matching load_secrets / full vault play behavior. Made-with: Cursor
Restore inline hub k8s_exec (apply_one task file was missing). When ssCsiWorkloadAuth entry sets roleSlug, use it as the vault role suffix; otherwise keep SHA1 hash. Spoke rows use the same rule so chart stable slugs can match Ansible. Made-with: Cursor
…sscsi workload auth elements from managed clustergroups
dminnear-rh
left a comment
There was a problem hiding this comment.
Looks good as far as I can tell, might be worth waiting an letting Michele or somebody more familiar with the secrets roles take a look as well before merging. I didn't see anything that looked like a breaking change but I'm definitely not the most knowledgeable
|
Thanks - the whole point is that the SSCSI stuff is additive, and is designed not to interfere with any of the existing secrets flows. But there's a lot and it makes sense to be careful with it. I'll ask for Michele's review as well. |
|
(Side note: I'm thinking of adding a similar mechanism for creating an AAP-specific role in preference to the current aap-config mechanism, if this approach is deemed good enough. I'll document that too. I'm also planning on a further follow-up PR to clustergroup to document the use of the CSI elements, and plug some legacy holes) |
mbaldessari
left a comment
There was a problem hiding this comment.
Just an initial quick pass really as all of these are really big. Tomorrow I'll try to deploy your mcg branch + clustergroup changes + cluster_utils and will have proper feedback I hope
cf1203c to
74657c3
Compare
|
Note: The force push was to undo an inadvertent push I when I merged and pushed my other PR to the wrong branch. |
|
So using the mcg branch (https://github.com/mbaldessari/multicloud-gitops/tree/marty-sscsi) I get this: Maybe something related to my setup? |
|
Try removing the deployment and retrying. Most likely you're working with the configmap from before it had the right CA injected. I'm trying to avoid sync waves (since the CM entry will always exist, it's a tricky dependency to handle). |
|
Following further research, the configmap timing problem is relatively gnarly. It can't be trivially solved with either init containers (the mount is missing, so the pod doesn't start, and the check in the init container doesn't run) or with failing the deployment somehow (since that leaves the deployment degraded, but that does not trigger a resync or remount of the config map). I don't see this issue in AEG because the bootstrap config job is already gated by sync waves on other things that need the CA material. There is a more general solution to the problem which is the Reloader operator: https://github.com/stakater/Reloader. I can remove a lot of silliness from the existing solution by using that; but we should probably talk about whether we want to provide framework-level support for that operator. |
|
@mbaldessari mhjacks/multicloud-gitops@9049239 introduces a cronjob to the config-demo app that restarts the deployment when degraded (which usually happens because of the x509 issue as discussed above). It's not especially elegant, but it should be effective. If you pull it into your fork, it should work. |
|
While troubleshooting this, I discovered another issue, with potentially deploying mulitple SPCs in a single (argo) Application. I'm going to work on that a bit i nthe meantime, so moving back to WIP. Thanks for the feedback so far. |

Add mechanism to cluster_utils to create kubernetes auth for SS-CSI after the manner of ESO. CA trusts are expected to be provided separately