This repository provisions a fixed Cisco SD-WAN controller lab on STACKIT:
- 3
vManage - 2
vBond - 2
vSmart
The current published flow covers Terraform deployment, vManage /dev/vdb first-boot formatting, 3-node vManage cluster formation, and controller certificate enrollment through vManage APIs.
- management, transport, and vManage cluster networks
- separate management, transport, and cluster security groups
- controller public-IP peer allowlisting on management and transport
- explicit transport ingress for SD-WAN control port
12346 - fixed private NIC IPs on STACKIT while keeping controller interfaces on DHCP in the guest
- public IPs on controller management and transport NICs by default
- day-0 cloud-init for
vManage,vBond, andvSmart - one extra data disk on each
vManage
By default, controller certificates use cisco_pki. That means the built-in Cisco trust bundle in the image is left untouched during Terraform deployment, and the certificate workflow is completed later through ./scripts/stackit_cluster_certificate.py.
- Upload the controller images to STACKIT first and capture the image IDs.
- Download the controller qcow2 images first from software.cisco.com under
SDWAN:vManage SoftwarevSmart SoftwarevEdge Cloud, then use thevBond Softwareimage from that section
- Make sure the Cisco Smart Account organization and Plug and Play controller profile already exist on software.cisco.com.
- Set
organization_nameto the exact organization name used on the Cisco portal. - Set
vbond_hostnameto the vBond FQDN used in the Cisco controller profile. The default isvbond.vbond.
If you want a repo helper for the image-upload stage, use:
python3 ./scripts/stackit_upload_image.py \
--vmanage-path /absolute/path/to/vmanage.qcow2 \
--vsmart-path /absolute/path/to/vsmart.qcow2 \
--vbond-path /absolute/path/to/vbond.qcow2The helper wraps stackit image create and prints the resulting image_ids = { ... } block in the format expected by terraform.tfvars.
Use the qcow2 files downloaded from software.cisco.com > SDWAN > vManage Software / vSmart Software / vEdge Cloud > vBond Software.
After upload, wait until each imported image is fully available in STACKIT before running terraform apply. If Terraform starts while an image is still processing, boot-volume creation can fail with errors such as Image <id> is not active.
vbond_hostname is not just a label. It must:
- be a real vBond FQDN
- be DNS resolvable from the controller VMs
- match what you configure in
software.cisco.com > Network Plug and Play > Controller Profiles - match the vBond FQDN you expect the controllers to use at runtime
Use the official STACKIT Terraform provider authentication flow. A local service account key file is the simplest setup:
export STACKIT_SERVICE_ACCOUNT_KEY_PATH=/absolute/path/to/service-account-key.jsonIf your service account was created with your own RSA key pair, also export:
export STACKIT_PRIVATE_KEY_PATH=/absolute/path/to/private-key.pemThe Python helpers in scripts/ are intended to run from a local virtual environment.
Use Python 3.11 or newer, then create and activate a virtual environment:
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -r requirements.txtThe checked-in requirements.txt currently installs the Python package used by the script set:
requests
That virtual environment covers Python packages only. Install these tools separately on the host:
terraformopensslbashsshcurlnc
If you keep the repo in a different folder, activate the virtual environment from that copied checkout before running the Python scripts there.
- Copy the example file:
cp terraform.tfvars.example terraform.tfvars- Fill in at least:
project_idorganization_namevbond_hostnameimage_idsmachine_typesadmin_passwordadmin_password_hashadmin_access_cidrs
Important input notes:
organization_namemust match the organization name used onsoftware.cisco.com.vbond_hostnamemust be the DNS-resolvable vBond FQDN and must match the value configured in the Cisco Plug and Play controller profile.admin_access_cidrsshould contain the external operator/admin source ranges that need access to the controller public IPs.- Access between the controller instances themselves is added automatically by Terraform. Do not put internal controller IPs into
admin_access_cidrs. run_vmanage_firstboot_initdefaults tofalseso you can first confirmterraform applycompleted successfully and then run the disk-formatting helper independently.
Appliance VM labels include the mandatory SentinelOne exemption block on every stackit_server resource:
image_origin = "vendor"product = "cisco-sdwan"s1risk = "RK0027865"
controller_certificate_method defaults to cisco_pki.
Use cisco_pki when:
- you want Cisco PKI to sign controller certificates
- your Smart Account and controller profile are already prepared on
software.cisco.com organization_nameandvbond_hostnamematch the Cisco portal values- you want Terraform/cloud-init to leave the default Cisco trust bundle in the image untouched
Use enterprise_local only if you explicitly want to keep a local shared controller CA and sign controller certificates yourself.
The current Terraform flow uses these active templates:
cloud-init/vmanage-rootca.yaml.tftplcloud-init/vbond-rootca.yaml.tftplcloud-init/vsmart-rootca.yaml.tftplcloud-init/vmanage.xml.tftplcloud-init/vbond.xml.tftplcloud-init/vsmart.xml.tftpl
Those are the current working templates. The other files under cloud-init/ are retained as named variations from earlier experiments, compatibility tests, or legacy flows.
The Python scripts derive controller addresses and metadata from Terraform outputs, especially:
controller_inventoryvmanage_urlsprimary_vbond_transport_ip
By default, the Python scripts assume the Terraform module directory is the repository root. These scripts support --module-dir so you can point them at a different checkout or copied folder:
scripts/stackit_disk_format.pyscripts/stackit_cluster_certificate.pyscripts/stackit_upload_image.pyscripts/add_controllers_to_vmanage.pyscripts/post_deploy_controllers.py
Examples:
python3 ./scripts/stackit_disk_format.py --module-dir /absolute/path/to/sdwan-terraform-stackit
python3 ./scripts/stackit_cluster_certificate.py --module-dir /absolute/path/to/sdwan-terraform-stackitThe shell helpers use paths relative to the repo they are run from, so if you copied the repo elsewhere, run those helpers from that copied checkout.
If you keep multiple checkouts, make sure the active virtual environment and the --module-dir value refer to the same repo copy. The scripts read Terraform outputs from --module-dir, not from the shell's current working directory alone.
Run the Terraform stage first:
terraform init
terraform plan
terraform applyKeep this setting disabled for the standard manual workflow:
run_vmanage_firstboot_init = falseThat is the intended default. It lets you verify the infrastructure deployment before triggering the interactive first-boot disk handling.
python3 ./scripts/stackit_disk_format.pyThis script:
- reads the
vManagenodes from Terraformcontroller_inventory - runs the
/dev/vdbfirst-boot flow in parallel - waits for each node to confirm
/opt/datais mounted as a separate filesystem
python3 ./scripts/stackit_cluster_certificate.pyThis wrapper:
- runs 3-node vManage cluster formation first
- then runs controller certificate enrollment
- is safe to rerun because the underlying cluster and certificate stages are rerunnable
If controller_certificate_method = "cisco_pki", the certificate stage:
- pauses for manual Cisco Services Registration when needed
- expects the Smart Account settings on vManage to match
organization_name - expects the vBond FQDN on the Cisco controller profile to match
vbond_hostname - adds
vSmartandvBondthrough vManage APIs - generates controller CSRs through vManage APIs
- waits for Cisco PKI to install the certificates
- syncs vSmart certs to vBond and verifies
vSmart/vBondreachability
If controller_certificate_method = "enterprise_local", the same wrapper falls back to the enterprise-local certificate stage, which:
- uploads the local controller root CA to vManage
- reads CSRs from vManage APIs
- signs them locally
- installs the signed certificates back through vManage APIs
Legacy flows are kept under scripts/legacy/ as a safety net:
scripts/legacy/post_deploy_controllers.pyscripts/legacy/add_controllers_to_vmanage.py
Prefer the teardown helper over a raw terraform destroy:
bash ./scripts/teardown_stackit_lab.shIt retries the normal destroy flow and, if needed, stops vManage nodes and detaches their data volumes before retrying.
Common issues seen so far:
-
STACKIT API resets during
terraform apply- Symptom: server create or poll fails late with transport reset or transient API errors.
- Action: rerun
terraform apply. The current graph is safe to continue from partial state.
-
vManageHTTPS is up but/dev/vdbformatting is not actually complete- Symptom: login still shows storage formatting prompts on one or more managers.
- Action: rerun
python3 ./scripts/stackit_disk_format.py. The current script validates/opt/datainstead of trusting early HTTPS.
-
Cisco Services Registration looks complete in the portal but some older APIs return empty objects
- Symptom:
smartaccountcredentialsorpnpConnectSyncreturns{}while the portal clearly shows Plug-and-Play registered. - Action: rely on the current certificate stage inside
stackit_cluster_certificate.py, which now treats theciscoServicesPlug-and-Play registration row as the authoritative signal on this build.
- Symptom:
-
vSmartorvBondadd fails through vManage- Symptom:
Unable to connect to admin@...:830. - Action: on this build, controller onboarding works through the management public IP path. The current script already prefers that path.
- Symptom:
-
Secondary
vManageCSR generation behaves differently than the primary- Symptom: CSR generation from the primary API says it cannot find the device.
- Action: the current script uses each secondary
vManagenode’s own API endpoint for CSR generation.
-
Teardown gets stuck on
vManagevolume detach- Symptom: raw
terraform destroyhangs or fails repeatedly near the data disks. - Action: use
bash ./scripts/teardown_stackit_lab.sh.
- Symptom: raw
-
STACKIT provider warning about
No network interfaces configured- Symptom:
terraform validateor other provider operations emit that warning forstackit_server.controller. - Action: this warning is incorrect for this repo. Terraform still creates and attaches the controller management, transport, and cluster network interfaces as defined in the plan. Treat it as a known STACKIT provider false-positive and validate the actual plan or apply outcome instead of treating the warning alone as fatal.
- Symptom:
vManageandvSmartusevbond.vbondby default and inject both vBond transport IPs intovpn 0 host.- Access between controller instances is created automatically during Terraform deployment on both public and private paths needed by this lab.
network_ipv4_nameserversdefaults to["1.1.1.1", "8.8.8.8"]; set it tonullif you want the STACKIT network defaults instead.- Controller site IDs are configured per node, not as one shared value.
- The checked-in
terraform.tfvars.exampleis only a template. Keep your localterraform.tfvars,certs/,.terraform/, Terraform state, andartifacts/out of version control. - If you want a single stable front door for the 3-node vManage cluster, you can also place a load balancer in front of the managers. That is optional for this repo flow, but it can be useful for operator access and external integrations.
See CONTRIBUTIONS.md for contribution contact details.