Switch pod-scheduler configs from JSON to YAML; add JSON Schema by LoopedBard3 · Pull Request #2170 · aspnet/Benchmarks

LoopedBard3 · 2026-05-07T20:26:59Z

Why

Follow-up to #2167. The pod-scheduler configs introduced there were JSON, which makes them awkward to read and edit — no inline comments, lots of quote/comma noise, and type: 3 is a magic integer you have to grep models.py to understand. The named-key machines: {sut: ..., load: ..., db: ...} shape also turned out to be more friction than information.

This PR switches the source-of-truth configs to YAML, ships a JSON Schema, and simplifies the pod definition along the way.

What

Format

Move build/benchmarks_ci{,_azure,_cobalt}_pods.json → .yml. Each opens with:
```
# yaml-language-server: $schema=../scripts/pod-scheduler/pod-config.schema.json
```
so VS Code / Cursor / the Red Hat YAML extension provide autocomplete, hover docs, and inline validation while editing.
Block style throughout; inline comments document the things that used to live only in PR review threads (e.g., that gold-lin and gold-win share gold-db).

Pod shape

machines and profiles are now parallel arrays of 1–3 unique non-empty strings. machines[i] runs profiles[i], and the role of slot i is implicit by length (1 = SUT, 2 = SUT + load, 3 = SUT + load + db). No sut/load/db keys anywhere.

- name: gold-lin
  machines:
  - gold-lin
  - gold-load
  - gold-db
  profiles:
  - gold-lin-app
  - gold-load-load
  - gold-db-db

The loader enforces, per pod:

len(machines) == len(profiles) — each machine pairs with exactly one profile,
both lists are unique within the pod,
bools (the YAML yes/no trap) and other non-strings are rejected.

Global machine → one profile is intentionally not enforced — azure2-db legitimately serves the db role in azure2-amd64 and the load role in idna-amd-win with different profiles in each. A regression test pins that down.

Self-documenting scenario type

type: 1 | 2 | 3 → type: single | dual | triple (case-insensitive). Integer form is still accepted for back-compat; bools and unknown strings raise ConfigError.

Loader

yaml.safe_load for any path. YAML is a strict superset of JSON, so existing .json configs continue to parse transparently — no extension dispatch.
Numeric fields (timeout, target_yaml_count, schedule_offset_hours, estimated_runtime) go through helpers that reject bools and enforce range minimums so a timeout: yes can't quietly resolve to 1.

Schema

New scripts/pod-scheduler/pod-config.schema.json (Draft 2020-12). Schema descriptions double as grounding for LLM-driven edits.
Validated locally with the jsonschema package; all three configs validate clean.
additionalProperties: false on pods and scenarios catches misspelled keys at edit time (e.g. mahcines: instead of machines:).

Model

Scenario.type is a plain int (1/2/3 with SCENARIO_TYPE_* named constants). The previous ScenarioType IntEnum is gone.
Pod stores machines: List[str] and profiles: List[str] directly — no per-role fields. machines_for_type(t) and profiles_for_type(t) are just slices.
Adding a 4th role later only requires bumping the schema's maxItems, appending to ROLE_NAMES, and adding one alias entry. The rest of the code derives everything from len(pod.machines).

Generated pipelines

benchmarks-ci-01.yml, -02.yml, -azure.yml, -cobalt.yml regenerated. The only delta vs HEAD is the embedded regen-command line in each file's header (.json → .yml); scheduling output is byte-identical, enforced by the snapshot test.

Dependencies and tooling

New scripts/pod-scheduler/requirements.txt declares the runtime PyYAML dependency. The README gets a Prerequisites subsection pointing at it so a clean checkout has a documented pip install -r ... step.

Upstream sync

The branch was rebased through upstream Remove EOL'd citrine/perf relays + VMs from pod scheduler and profiles #2169, which retired the EOL'd intel-lin, intel-win, intel-perflin, and amd-lin2 pods. The YAML config and pipeline YAMLs reflect that prune; Azure and Cobalt configs were unaffected.

Tests

68 unit tests pass. New coverage includes:
- All three string aliases for type (case variations) + integer back-compat.
- Bool rejection on every numeric field (type, timeout, estimated_runtime, target_yaml_count, schedule_offset_hours) and on entries inside machines/profiles.
- Pod construction invariants: length mismatch, empty list, too-long list, duplicate entries.
- Same physical machine playing different roles across pods (global non-1:1 invariant).
- .json back-compat via the YAML parser.

Docs

Top-level README.md, build/README.md, and scripts/pod-scheduler/README.md rewritten with YAML examples, the schema-directive convention, and the parallel-array pod shape.

Verification

python -m pip install -r scripts/pod-scheduler/requirements.txt
cd scripts/pod-scheduler
python -m unittest discover tests
# Ran 68 tests in ~0.3s — OK

The snapshot test enforces that regenerating from the new YAML configs produces byte-identical pipeline YAML, so this is a config/format refactor with no behavioral change.

Notes for reviewers

_format_source_path keeps the repo-relative URL behavior; it now expects .yml paths to match how main.py is invoked from CI.
_parse_scenario_type explicitly rejects bool because yaml.safe_load turns yes/no/true/false into Python bools, which would otherwise quietly resolve to 1/0 via the integer alias. The same bool-guard runs on every numeric field.
The schema's maxItems: 3 is the single remaining hardcoded role-count cap; everything else in the code path derives from len(pod.machines).

The three pod configs in build/ were JSON, which made them hard to read and didn't allow inline comments. This converts them to YAML, ships a JSON Schema describing the shape, and changes scenario.type from a magic 1/2/3 integer to a self-documenting single/dual/triple string. * build/benchmarks_ci_pods.yml, benchmarks_ci_azure_pods.yml, and benchmarks_ci_cobalt_pods.yml replace their .json predecessors. Each opens with a yaml-language-server # yaml-language-server: schema=... directive so VS Code / Cursor / the YAML LSP provide autocomplete, hover docs, and inline validation while editing. * scripts/pod-scheduler/pod-config.schema.json (new) is the schema. The schema descriptions also serve as reliable grounding for LLM-driven edits. * config_loader.py now dispatches on file extension and accepts both YAML and JSON. scenario.type accepts single|dual|triple (case-insensitive) plus the legacy integer 1|2|3 for back-compat; bools (which YAML parsers happily produce from yes/no) are rejected explicitly. Unknown extensions and unknown type strings raise ConfigError so typos can't silently drop scenarios. * Generated benchmarks-ci-*.yml files only change the embedded regen command in the file header (.json -> .yml). Schedule data is byte-identical, verified by the existing snapshot test. * Tests: 5 new cases covering YAML loading, type-string aliases, type back-compat, bool/invalid rejection, and unknown-extension rejection. Default fixture writer now produces YAML so the YAML path is exercised on every test. * Docs: README.md, build/README.md, and scripts/pod-scheduler/README.md are updated with YAML examples and the schema-directive convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR migrates the pod-scheduler’s source-of-truth configs in build/ from JSON to YAML to improve readability (comments) and editor support, and introduces a JSON Schema to validate/configure those YAML files while keeping JSON loading for backward compatibility.

Changes:

Converted the three build/benchmarks_ci_*_pods.json configs to .yml equivalents and updated references/docs accordingly.
Added scripts/pod-scheduler/pod-config.schema.json and adopted yaml-language-server $schema directives in shipped configs.
Updated the loader to dispatch by extension (YAML/JSON) and to accept scenario.type as single|dual|triple (case-insensitive) while still allowing legacy 1|2|3.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
scripts/pod-scheduler/config_loader.py	Load YAML/JSON by extension; add scenario type aliasing and validation.
scripts/pod-scheduler/pod-config.schema.json	New JSON Schema describing the YAML/JSON config shape for LSP validation.
scripts/pod-scheduler/models.py	Update docs/comments to reflect YAML/JSON config sources.
scripts/pod-scheduler/generator.py	Update generated header regen command to reference `.yml`.
scripts/pod-scheduler/scheduler.py	Docstring wording updated from JSON-specific to config-generic.
scripts/pod-scheduler/main.py	CLI help/usage updated to YAML/JSON.
scripts/pod-scheduler/README.md	Docs updated with YAML examples and schema-driven editing guidance.
scripts/pod-scheduler/tests/test_config_loader.py	Add YAML-path tests + scenario.type alias/back-compat coverage.
scripts/pod-scheduler/tests/test_generator.py	Update expected filenames/paths from `.json` to `.yml`.
scripts/pod-scheduler/tests/test_snapshots.py	Snapshot inputs updated to new `.yml` configs.
build/benchmarks_ci_pods.yml	New YAML config replacing JSON for on-prem CI pods.
build/benchmarks_ci_azure_pods.yml	New YAML config replacing JSON for Azure CI pods.
build/benchmarks_ci_cobalt_pods.yml	New YAML config replacing JSON for Cobalt CI pods.
build/benchmarks_ci_pods.json	Removed legacy JSON config (replaced by YAML).
build/benchmarks_ci_azure_pods.json	Removed legacy JSON config (replaced by YAML).
build/benchmarks_ci_cobalt_pods.json	Removed legacy JSON config (replaced by YAML).
build/benchmarks-ci-01.yml	Regenerated header updated to `.yml` config path.
build/benchmarks-ci-02.yml	Regenerated header updated to `.yml` config path.
build/benchmarks-ci-azure.yml	Regenerated header updated to `.yml` config path.
build/benchmarks-ci-cobalt.yml	Regenerated header updated to `.yml` config path.
build/README.md	Docs updated to reference YAML configs and schema directive.
README.md	Top-level doc updated to say pipelines are generated from YAML configs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Three of the review comments on aspnet#2170 flagged the same class of issue around YAML/Python edge cases. Addressing them all here: * scripts/pod-scheduler/requirements.txt (new) declares the runtime PyYAML dependency that landed with the YAML migration. Without this, a clean checkout fails with ModuleNotFoundError. * scripts/pod-scheduler/README.md gains a Prerequisites subsection pointing at requirements.txt so contributors know what to install. * config_loader._reject_bool / _coerce_int / _coerce_float (new helpers) explicitly reject Python bools in numeric fields. Previously `timeout: yes` would slip through `int(timeout)` and resolve to 1 because bool is a subclass of int. The new helpers also enforce range minimums (target_yaml_count >= 1, schedule_offset_hours >= 0, timeout >= 1, estimated_runtime >= 0) so out-of-band values fail loudly at load time. * test_config_loader.py adds 6 cases covering bool rejection on each numeric field plus negative-runtime and below-minimum target-count. All 55 tests pass. The schema already used "type": "integer" for these fields (which in JSON Schema 2020-12 explicitly excludes booleans), so editor-side validation was already correct; this commit closes the runtime gap. Generated pipeline YAML still regenerates byte-identical to HEAD. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@sebastienros

Per @sebastienros's review feedback on aspnet#2170, the named-key form for `machines` / `profiles` was visually noisy when almost every pod follows the same SUT-load-DB convention. Both blocks now accept either: * a positional list of 1-3 entries (`[sut]` / `[sut, load]` / `[sut, load, db]`), preferred for new pods, or * the existing named-dict form (`{sut: ..., load: ..., db: ...}`), still supported. The two blocks for a single pod must use the same form and declare the same set of roles; mixing list and dict, or differing lengths, raises `ConfigError` at load time so the role mapping can't drift silently. Schema (pod-config.schema.json) gets a `oneOf` for both blocks; the YAML LSP catches shape errors at edit time. * `config_loader._normalize_roles` resolves either form to a canonical ordered dict and rejects unknown roles, bools, non-strings, and empty values. The pod loader then enforces same-shape and same-role-set between machines and profiles before constructing the Pod. * All three committed configs (`benchmarks_ci_pods.yml`, `benchmarks_ci_azure_pods.yml`, `benchmarks_ci_cobalt_pods.yml`) switch to the shorthand form. Each gets a header comment above its pods block explaining the convention. Pods that re-use a machine across roles (e.g. `cobalt-cloud-lin-azl3-dual` puts the db box in the `load` slot) keep the inline comment that flags the reuse. * `scripts/pod-scheduler/README.md` gains a Pod Definition section showing both forms and recommending the list form for new pods. * test_config_loader.py adds 10 cases covering shorthand happy paths (1/2/3 entries), mixed-shape rejection, length mismatch, empty/too-long lists, bool entries, and unknown-role keys. All 65 tests pass. Generated pipeline YAMLs are byte-identical to HEAD: this is purely an input-format change, the schedule data is unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@sebastienros

Two simplifications from @sebastienros's round 3 review on aspnet#2170: * Drop the per-extension dispatch in config_loader._load_raw. YAML is a strict superset of JSON, so yaml.safe_load handles both. Old .json configs still load via the YAML parser, and the loader no longer cares about the file extension. _load_raw is a one-liner now. * Reshape pod definitions into parallel arrays. Both `machines` and `profiles` are flat 1-3 entry lists; machines[i] runs profiles[i]. The role of slot i is implicit by length (1=SUT, 2=SUT+load, 3=SUT+load+db) so neither list carries explicit `sut`/`load`/`db` keys. The named-dict form is no longer accepted. The loader now enforces, per pod: * len(machines) == len(profiles), so each machine pairs with exactly one profile. * machines and profiles are both lists of 1-3 unique non-empty strings. * bools (yes/no/true/false from YAML) are rejected explicitly. Per-pod uniqueness is enforced; global `machine -> single profile` isn't, because real configs legitimately reuse a physical box across roles in different pods (e.g. azure2-db is db for one pod and load for another). Test `test_global_machine_can_have_different_profiles_across_pods` documents that as intended. Configs: * All three .yml configs reformatted from inline flow style `[a, b, c]` to block style and converted to the parallel-array form. Each retains the existing inline comment for pods that re-use machines across roles. The header comment over each file's pods block now describes the parallel-array semantics. * Schema (`pod-config.schema.json`) tightens machines/profiles to `array of unique strings, minItems 1, maxItems 3`; the old oneOf with the dict variant is gone. * README's Pod Definition section rewritten around the new shape. Tests: * Drop tests for removed behavior (extension rejection, dict form, named-role rejection, mixed-shape rejection). * Add `TestPodRoles` covering single/dual/triple happy paths, length-mismatch, machines uniqueness, profiles uniqueness, empty/too- long lists, dict form rejection, bool entries, and the global same-machine/different-profile case. * 66 tests pass (was 65). Generated pipeline YAMLs are byte-identical to HEAD; this is purely an input-format simplification. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Future-proofing tweak following the round-3 review on aspnet#2170. Removes three of the four `maxItems: 3` hardcoding sites so adding a fourth role later (e.g. a second load machine) only requires a schema bump. * models.ScenarioType IntEnum is gone. `Scenario.type` is now a plain `int` (1/2/3 == single/dual/triple). `SCENARIO_TYPE_SINGLE` / `DUAL` / `TRIPLE` module constants stay for readable references. `DEFAULT_RUNTIMES` is keyed by int. * `Pod` no longer has `sut` / `load` / `db` / `*_profile` fields. The dataclass stores `machines: List[str]` and `profiles: List[str]` (parallel arrays). Backward-compat properties (`pod.sut`, `pod.load`, etc.) read from the lists, so call sites that need a named slot (the conflicts diagnostic) keep working unchanged. * `pod.machines_for_type(t)` and `profiles_for_type(t)` are slices (`self.machines[:t]`); `validate(t)` is just `t > len(machines)`. No more 2/3 magic numbers branching on which fields are populated. * `__post_init__` enforces `len(machines) == len(profiles)` and non-empty machines, so an invalid Pod can't be constructed. * `main.print_pod_conflicts` iterates `pod.machines` against `ROLE_NAMES` (a new module-level constant) for the diagnostic output, so adding a 4th role just needs a name appended to `ROLE_NAMES` -- everything else falls out of `len(pod.machines)`. * All call sites and tests updated. `test_models` gains 4 cases exercising the new constructor invariants and the backward-compat property accessors. 70 tests pass (was 66). The schema's `maxItems: 3` is now the single remaining `3` hardcoding -- that's the user-facing constraint, which is the right place for it. Generated pipeline YAMLs are byte-identical to HEAD; this is purely an internal model refactor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…t#2169) Upstream PR aspnet#2169 removed the four pods whose SUT machines were end-of-lifed (intel-lin, intel-win, intel-perflin, amd-lin2), leaving only gold-lin and gold-win on the on-prem CI fleet. This merge applies the same prune to our YAML-format pod config and regenerates the pipeline YAMLs. Conflict resolution: * build/benchmarks_ci_pods.json -- upstream modified the (now-deleted) JSON config; we kept our deletion. The equivalent edits are applied directly to build/benchmarks_ci_pods.yml: pods is trimmed to [gold-lin, gold-win] and every scenario's pods: list is pruned to match upstream's post-aspnet#2169 contents. * build/benchmarks-ci-01.yml / -02.yml -- regenerated from the updated YAML config. Output matches upstream's regenerated pipelines exactly, modulo the embedded regen-command header which (correctly) now points at the .yml config instead of the deleted .json. The Azure and Cobalt pod configs are unaffected by aspnet#2169 -- they don't reference any of the EOL'd machines -- and their pipeline YAMLs are unchanged. All 70 pod-scheduler unit tests still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

These were added in 5b0bb62 as `backward-compat single-role accessors` for callers that `need a named slot without caring how many roles a pod has`. In the same commit I refactored the one real caller (`main.print_pod_conflicts`) to iterate `pod.machines` against `ROLE_NAMES` instead, so the safety net never had anything to catch. Grep confirmed the only remaining callers were tests asserting that the properties returned what we'd expect -- circular coverage. Removing the properties and updating the tests to read `pod.machines` / `pod.profiles` directly. The new `test_role_count_reflects_machines_length` covers the only computed property worth keeping (`role_count`). 69 tests pass (was 70 -- replaced two named-accessor tests with one role_count test). Generated pipeline YAMLs unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Audit follow-up to baa02f7. Four small items found: * models.Pod.role_count -- another property that only existed to be tested. Production code uses len(pod.machines) directly. Removed along with its test (test_role_count_reflects_machines_length). * config_loader._coerce_role_list docstring referenced _ROLE_ORDER, a constant that was removed in 9828898 when the named-dict form was dropped. Updated to point at models.ROLE_NAMES instead. * config_loader._reject_bool docstring said it covered `numeric/string fields`, but it's only called from _coerce_int and _coerce_float. The string-list bool check in _coerce_role_list is inline. Trimmed the docstring to `numeric fields`. * generator._render_yaml / generate_yamls -- `source_config` was Optional[str] = None with an else-branch that emitted a placeholder `<config>` regen command. Every caller (main.py and test_snapshots.py) always passes a concrete value, so the placeholder branch was unreachable. Tightened to `str` with no fallback rendering. 68 tests pass (was 69 with role_count removal). Generated pipeline YAMLs byte-identical to HEAD. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

LoopedBard3 self-assigned this May 7, 2026

LoopedBard3 requested a review from Copilot May 7, 2026 20:27

Copilot started reviewing on behalf of LoopedBard3 May 7, 2026 20:27 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread scripts/pod-scheduler/config_loader.py

Comment thread scripts/pod-scheduler/config_loader.py Outdated

Comment thread scripts/pod-scheduler/README.md

sebastienros reviewed May 7, 2026

View reviewed changes

Comment thread build/benchmarks_ci_azure_pods.yml Outdated

LoopedBard3 and others added 2 commits May 7, 2026 14:45

sebastienros reviewed May 7, 2026

View reviewed changes

Comment thread build/benchmarks_ci_pods.yml Outdated

sebastienros reviewed May 7, 2026

View reviewed changes

Comment thread build/benchmarks_ci_pods.yml Outdated

LoopedBard3 and others added 5 commits May 7, 2026 15:48

LoopedBard3 requested a review from DrewScoggins May 14, 2026 17:52

LoopedBard3 marked this pull request as ready for review May 14, 2026 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch pod-scheduler configs from JSON to YAML; add JSON Schema#2170

Switch pod-scheduler configs from JSON to YAML; add JSON Schema#2170
LoopedBard3 wants to merge 8 commits into
aspnet:mainfrom
LoopedBard3:loopedbard3/pod-config-format-options

LoopedBard3 commented May 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

LoopedBard3 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Format

Pod shape

Self-documenting scenario type

Loader

Schema

Model

Generated pipelines

Dependencies and tooling

Upstream sync

Tests

Docs

Verification

Notes for reviewers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LoopedBard3 commented May 7, 2026 •

edited

Loading