Switch pod-scheduler configs from JSON to YAML; add JSON Schema#2170
Open
LoopedBard3 wants to merge 8 commits into
Open
Switch pod-scheduler configs from JSON to YAML; add JSON Schema#2170LoopedBard3 wants to merge 8 commits into
LoopedBard3 wants to merge 8 commits into
Conversation
The three pod configs in build/ were JSON, which made them hard to read and didn't allow inline comments. This converts them to YAML, ships a JSON Schema describing the shape, and changes scenario.type from a magic 1/2/3 integer to a self-documenting single/dual/triple string. * build/benchmarks_ci_pods.yml, benchmarks_ci_azure_pods.yml, and benchmarks_ci_cobalt_pods.yml replace their .json predecessors. Each opens with a yaml-language-server # yaml-language-server: schema=... directive so VS Code / Cursor / the YAML LSP provide autocomplete, hover docs, and inline validation while editing. * scripts/pod-scheduler/pod-config.schema.json (new) is the schema. The schema descriptions also serve as reliable grounding for LLM-driven edits. * config_loader.py now dispatches on file extension and accepts both YAML and JSON. scenario.type accepts single|dual|triple (case-insensitive) plus the legacy integer 1|2|3 for back-compat; bools (which YAML parsers happily produce from yes/no) are rejected explicitly. Unknown extensions and unknown type strings raise ConfigError so typos can't silently drop scenarios. * Generated benchmarks-ci-*.yml files only change the embedded regen command in the file header (.json -> .yml). Schedule data is byte-identical, verified by the existing snapshot test. * Tests: 5 new cases covering YAML loading, type-string aliases, type back-compat, bool/invalid rejection, and unknown-extension rejection. Default fixture writer now produces YAML so the YAML path is exercised on every test. * Docs: README.md, build/README.md, and scripts/pod-scheduler/README.md are updated with YAML examples and the schema-directive convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR migrates the pod-scheduler’s source-of-truth configs in build/ from JSON to YAML to improve readability (comments) and editor support, and introduces a JSON Schema to validate/configure those YAML files while keeping JSON loading for backward compatibility.
Changes:
- Converted the three
build/benchmarks_ci_*_pods.jsonconfigs to.ymlequivalents and updated references/docs accordingly. - Added
scripts/pod-scheduler/pod-config.schema.jsonand adoptedyaml-language-server$schemadirectives in shipped configs. - Updated the loader to dispatch by extension (YAML/JSON) and to accept
scenario.typeassingle|dual|triple(case-insensitive) while still allowing legacy1|2|3.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/pod-scheduler/config_loader.py | Load YAML/JSON by extension; add scenario type aliasing and validation. |
| scripts/pod-scheduler/pod-config.schema.json | New JSON Schema describing the YAML/JSON config shape for LSP validation. |
| scripts/pod-scheduler/models.py | Update docs/comments to reflect YAML/JSON config sources. |
| scripts/pod-scheduler/generator.py | Update generated header regen command to reference .yml. |
| scripts/pod-scheduler/scheduler.py | Docstring wording updated from JSON-specific to config-generic. |
| scripts/pod-scheduler/main.py | CLI help/usage updated to YAML/JSON. |
| scripts/pod-scheduler/README.md | Docs updated with YAML examples and schema-driven editing guidance. |
| scripts/pod-scheduler/tests/test_config_loader.py | Add YAML-path tests + scenario.type alias/back-compat coverage. |
| scripts/pod-scheduler/tests/test_generator.py | Update expected filenames/paths from .json to .yml. |
| scripts/pod-scheduler/tests/test_snapshots.py | Snapshot inputs updated to new .yml configs. |
| build/benchmarks_ci_pods.yml | New YAML config replacing JSON for on-prem CI pods. |
| build/benchmarks_ci_azure_pods.yml | New YAML config replacing JSON for Azure CI pods. |
| build/benchmarks_ci_cobalt_pods.yml | New YAML config replacing JSON for Cobalt CI pods. |
| build/benchmarks_ci_pods.json | Removed legacy JSON config (replaced by YAML). |
| build/benchmarks_ci_azure_pods.json | Removed legacy JSON config (replaced by YAML). |
| build/benchmarks_ci_cobalt_pods.json | Removed legacy JSON config (replaced by YAML). |
| build/benchmarks-ci-01.yml | Regenerated header updated to .yml config path. |
| build/benchmarks-ci-02.yml | Regenerated header updated to .yml config path. |
| build/benchmarks-ci-azure.yml | Regenerated header updated to .yml config path. |
| build/benchmarks-ci-cobalt.yml | Regenerated header updated to .yml config path. |
| build/README.md | Docs updated to reference YAML configs and schema directive. |
| README.md | Top-level doc updated to say pipelines are generated from YAML configs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
sebastienros
reviewed
May 7, 2026
Three of the review comments on aspnet#2170 flagged the same class of issue around YAML/Python edge cases. Addressing them all here: * scripts/pod-scheduler/requirements.txt (new) declares the runtime PyYAML dependency that landed with the YAML migration. Without this, a clean checkout fails with ModuleNotFoundError. * scripts/pod-scheduler/README.md gains a Prerequisites subsection pointing at requirements.txt so contributors know what to install. * config_loader._reject_bool / _coerce_int / _coerce_float (new helpers) explicitly reject Python bools in numeric fields. Previously `timeout: yes` would slip through `int(timeout)` and resolve to 1 because bool is a subclass of int. The new helpers also enforce range minimums (target_yaml_count >= 1, schedule_offset_hours >= 0, timeout >= 1, estimated_runtime >= 0) so out-of-band values fail loudly at load time. * test_config_loader.py adds 6 cases covering bool rejection on each numeric field plus negative-runtime and below-minimum target-count. All 55 tests pass. The schema already used "type": "integer" for these fields (which in JSON Schema 2020-12 explicitly excludes booleans), so editor-side validation was already correct; this commit closes the runtime gap. Generated pipeline YAML still regenerates byte-identical to HEAD. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per @sebastienros's review feedback on aspnet#2170, the named-key form for `machines` / `profiles` was visually noisy when almost every pod follows the same SUT-load-DB convention. Both blocks now accept either: * a positional list of 1-3 entries (`[sut]` / `[sut, load]` / `[sut, load, db]`), preferred for new pods, or * the existing named-dict form (`{sut: ..., load: ..., db: ...}`), still supported. The two blocks for a single pod must use the same form and declare the same set of roles; mixing list and dict, or differing lengths, raises `ConfigError` at load time so the role mapping can't drift silently. Schema (pod-config.schema.json) gets a `oneOf` for both blocks; the YAML LSP catches shape errors at edit time. * `config_loader._normalize_roles` resolves either form to a canonical ordered dict and rejects unknown roles, bools, non-strings, and empty values. The pod loader then enforces same-shape and same-role-set between machines and profiles before constructing the Pod. * All three committed configs (`benchmarks_ci_pods.yml`, `benchmarks_ci_azure_pods.yml`, `benchmarks_ci_cobalt_pods.yml`) switch to the shorthand form. Each gets a header comment above its pods block explaining the convention. Pods that re-use a machine across roles (e.g. `cobalt-cloud-lin-azl3-dual` puts the db box in the `load` slot) keep the inline comment that flags the reuse. * `scripts/pod-scheduler/README.md` gains a Pod Definition section showing both forms and recommending the list form for new pods. * test_config_loader.py adds 10 cases covering shorthand happy paths (1/2/3 entries), mixed-shape rejection, length mismatch, empty/too-long lists, bool entries, and unknown-role keys. All 65 tests pass. Generated pipeline YAMLs are byte-identical to HEAD: this is purely an input-format change, the schedule data is unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
sebastienros
reviewed
May 7, 2026
sebastienros
reviewed
May 7, 2026
Two simplifications from @sebastienros's round 3 review on aspnet#2170: * Drop the per-extension dispatch in config_loader._load_raw. YAML is a strict superset of JSON, so yaml.safe_load handles both. Old .json configs still load via the YAML parser, and the loader no longer cares about the file extension. _load_raw is a one-liner now. * Reshape pod definitions into parallel arrays. Both `machines` and `profiles` are flat 1-3 entry lists; machines[i] runs profiles[i]. The role of slot i is implicit by length (1=SUT, 2=SUT+load, 3=SUT+load+db) so neither list carries explicit `sut`/`load`/`db` keys. The named-dict form is no longer accepted. The loader now enforces, per pod: * len(machines) == len(profiles), so each machine pairs with exactly one profile. * machines and profiles are both lists of 1-3 unique non-empty strings. * bools (yes/no/true/false from YAML) are rejected explicitly. Per-pod uniqueness is enforced; global `machine -> single profile` isn't, because real configs legitimately reuse a physical box across roles in different pods (e.g. azure2-db is db for one pod and load for another). Test `test_global_machine_can_have_different_profiles_across_pods` documents that as intended. Configs: * All three .yml configs reformatted from inline flow style `[a, b, c]` to block style and converted to the parallel-array form. Each retains the existing inline comment for pods that re-use machines across roles. The header comment over each file's pods block now describes the parallel-array semantics. * Schema (`pod-config.schema.json`) tightens machines/profiles to `array of unique strings, minItems 1, maxItems 3`; the old oneOf with the dict variant is gone. * README's Pod Definition section rewritten around the new shape. Tests: * Drop tests for removed behavior (extension rejection, dict form, named-role rejection, mixed-shape rejection). * Add `TestPodRoles` covering single/dual/triple happy paths, length-mismatch, machines uniqueness, profiles uniqueness, empty/too- long lists, dict form rejection, bool entries, and the global same-machine/different-profile case. * 66 tests pass (was 65). Generated pipeline YAMLs are byte-identical to HEAD; this is purely an input-format simplification. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Future-proofing tweak following the round-3 review on aspnet#2170. Removes three of the four `maxItems: 3` hardcoding sites so adding a fourth role later (e.g. a second load machine) only requires a schema bump. * models.ScenarioType IntEnum is gone. `Scenario.type` is now a plain `int` (1/2/3 == single/dual/triple). `SCENARIO_TYPE_SINGLE` / `DUAL` / `TRIPLE` module constants stay for readable references. `DEFAULT_RUNTIMES` is keyed by int. * `Pod` no longer has `sut` / `load` / `db` / `*_profile` fields. The dataclass stores `machines: List[str]` and `profiles: List[str]` (parallel arrays). Backward-compat properties (`pod.sut`, `pod.load`, etc.) read from the lists, so call sites that need a named slot (the conflicts diagnostic) keep working unchanged. * `pod.machines_for_type(t)` and `profiles_for_type(t)` are slices (`self.machines[:t]`); `validate(t)` is just `t > len(machines)`. No more 2/3 magic numbers branching on which fields are populated. * `__post_init__` enforces `len(machines) == len(profiles)` and non-empty machines, so an invalid Pod can't be constructed. * `main.print_pod_conflicts` iterates `pod.machines` against `ROLE_NAMES` (a new module-level constant) for the diagnostic output, so adding a 4th role just needs a name appended to `ROLE_NAMES` -- everything else falls out of `len(pod.machines)`. * All call sites and tests updated. `test_models` gains 4 cases exercising the new constructor invariants and the backward-compat property accessors. 70 tests pass (was 66). The schema's `maxItems: 3` is now the single remaining `3` hardcoding -- that's the user-facing constraint, which is the right place for it. Generated pipeline YAMLs are byte-identical to HEAD; this is purely an internal model refactor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t#2169) Upstream PR aspnet#2169 removed the four pods whose SUT machines were end-of-lifed (intel-lin, intel-win, intel-perflin, amd-lin2), leaving only gold-lin and gold-win on the on-prem CI fleet. This merge applies the same prune to our YAML-format pod config and regenerates the pipeline YAMLs. Conflict resolution: * build/benchmarks_ci_pods.json -- upstream modified the (now-deleted) JSON config; we kept our deletion. The equivalent edits are applied directly to build/benchmarks_ci_pods.yml: pods is trimmed to [gold-lin, gold-win] and every scenario's pods: list is pruned to match upstream's post-aspnet#2169 contents. * build/benchmarks-ci-01.yml / -02.yml -- regenerated from the updated YAML config. Output matches upstream's regenerated pipelines exactly, modulo the embedded regen-command header which (correctly) now points at the .yml config instead of the deleted .json. The Azure and Cobalt pod configs are unaffected by aspnet#2169 -- they don't reference any of the EOL'd machines -- and their pipeline YAMLs are unchanged. All 70 pod-scheduler unit tests still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These were added in 5b0bb62 as `backward-compat single-role accessors` for callers that `need a named slot without caring how many roles a pod has`. In the same commit I refactored the one real caller (`main.print_pod_conflicts`) to iterate `pod.machines` against `ROLE_NAMES` instead, so the safety net never had anything to catch. Grep confirmed the only remaining callers were tests asserting that the properties returned what we'd expect -- circular coverage. Removing the properties and updating the tests to read `pod.machines` / `pod.profiles` directly. The new `test_role_count_reflects_machines_length` covers the only computed property worth keeping (`role_count`). 69 tests pass (was 70 -- replaced two named-accessor tests with one role_count test). Generated pipeline YAMLs unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Audit follow-up to baa02f7. Four small items found: * models.Pod.role_count -- another property that only existed to be tested. Production code uses len(pod.machines) directly. Removed along with its test (test_role_count_reflects_machines_length). * config_loader._coerce_role_list docstring referenced _ROLE_ORDER, a constant that was removed in 9828898 when the named-dict form was dropped. Updated to point at models.ROLE_NAMES instead. * config_loader._reject_bool docstring said it covered `numeric/string fields`, but it's only called from _coerce_int and _coerce_float. The string-list bool check in _coerce_role_list is inline. Trimmed the docstring to `numeric fields`. * generator._render_yaml / generate_yamls -- `source_config` was Optional[str] = None with an else-branch that emitted a placeholder `<config>` regen command. Every caller (main.py and test_snapshots.py) always passes a concrete value, so the placeholder branch was unreachable. Tightened to `str` with no fallback rendering. 68 tests pass (was 69 with role_count removal). Generated pipeline YAMLs byte-identical to HEAD. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Follow-up to #2167. The pod-scheduler configs introduced there were JSON, which makes them awkward to read and edit — no inline comments, lots of quote/comma noise, and
type: 3is a magic integer you have to grepmodels.pyto understand. The named-keymachines: {sut: ..., load: ..., db: ...}shape also turned out to be more friction than information.This PR switches the source-of-truth configs to YAML, ships a JSON Schema, and simplifies the pod definition along the way.
What
Format
build/benchmarks_ci{,_azure,_cobalt}_pods.json→.yml. Each opens with:# yaml-language-server: $schema=../scripts/pod-scheduler/pod-config.schema.jsongold-linandgold-winsharegold-db).Pod shape
machinesandprofilesare now parallel arrays of 1–3 unique non-empty strings.machines[i]runsprofiles[i], and the role of slotiis implicit by length (1 = SUT, 2 = SUT + load, 3 = SUT + load + db). Nosut/load/dbkeys anywhere.The loader enforces, per pod:
len(machines) == len(profiles)— each machine pairs with exactly one profile,yes/notrap) and other non-strings are rejected.Global
machine → one profileis intentionally not enforced —azure2-dblegitimately serves thedbrole inazure2-amd64and theloadrole inidna-amd-winwith different profiles in each. A regression test pins that down.Self-documenting scenario type
type: 1 | 2 | 3→type: single | dual | triple(case-insensitive). Integer form is still accepted for back-compat; bools and unknown strings raiseConfigError.Loader
yaml.safe_loadfor any path. YAML is a strict superset of JSON, so existing.jsonconfigs continue to parse transparently — no extension dispatch.timeout,target_yaml_count,schedule_offset_hours,estimated_runtime) go through helpers that reject bools and enforce range minimums so atimeout: yescan't quietly resolve to 1.Schema
scripts/pod-scheduler/pod-config.schema.json(Draft 2020-12). Schema descriptions double as grounding for LLM-driven edits.jsonschemapackage; all three configs validate clean.additionalProperties: falseon pods and scenarios catches misspelled keys at edit time (e.g.mahcines:instead ofmachines:).Model
Scenario.typeis a plainint(1/2/3 withSCENARIO_TYPE_*named constants). The previousScenarioTypeIntEnum is gone.Podstoresmachines: List[str]andprofiles: List[str]directly — no per-role fields.machines_for_type(t)andprofiles_for_type(t)are just slices.maxItems, appending toROLE_NAMES, and adding one alias entry. The rest of the code derives everything fromlen(pod.machines).Generated pipelines
benchmarks-ci-01.yml,-02.yml,-azure.yml,-cobalt.ymlregenerated. The only delta vs HEAD is the embedded regen-command line in each file's header (.json→.yml); scheduling output is byte-identical, enforced by the snapshot test.Dependencies and tooling
scripts/pod-scheduler/requirements.txtdeclares the runtime PyYAML dependency. The README gets a Prerequisites subsection pointing at it so a clean checkout has a documentedpip install -r ...step.Upstream sync
intel-lin,intel-win,intel-perflin, andamd-lin2pods. The YAML config and pipeline YAMLs reflect that prune; Azure and Cobalt configs were unaffected.Tests
type(case variations) + integer back-compat.machines/profiles..jsonback-compat via the YAML parser.Docs
README.md,build/README.md, andscripts/pod-scheduler/README.mdrewritten with YAML examples, the schema-directive convention, and the parallel-array pod shape.Verification
The snapshot test enforces that regenerating from the new YAML configs produces byte-identical pipeline YAML, so this is a config/format refactor with no behavioral change.
Notes for reviewers
_format_source_pathkeeps the repo-relative URL behavior; it now expects.ymlpaths to match howmain.pyis invoked from CI._parse_scenario_typeexplicitly rejectsboolbecauseyaml.safe_loadturnsyes/no/true/falseinto Python bools, which would otherwise quietly resolve to1/0via the integer alias. The same bool-guard runs on every numeric field.maxItems: 3is the single remaining hardcoded role-count cap; everything else in the code path derives fromlen(pod.machines).