Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 28 additions & 20 deletions helm/bundles/cortex-nova/templates/pipelines_kvm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -109,11 +109,13 @@ spec:
If a matching CommittedResource has unused capacity, the request is accepted.
Otherwise, PAYG headroom is checked for ram, cores, and instances.
Rejects all hosts if neither tier has headroom.
When dryRun is true the filter runs in shadow mode: it logs and emits
the cortex_nova_filter_quota_enforcement_decisions_total metric for
would-be rejects but never actually removes hosts.
params:
- {key: dryRun, boolValue: true}

Runs in shadow mode by default: the filter logs and emits the
cortex_nova_filter_quota_enforcement_decisions_total metric for
would-be rejects but never actually removes hosts. To globally reject
requests without headroom, add `params: [{key: enforce, boolValue: true}]`
to this step. The shadow default is intentional so that newly-rolled-out
releases never silently start rejecting requests.
weighers:
- name: kvm_prefer_smaller_hosts
params:
Expand Down Expand Up @@ -266,11 +268,13 @@ spec:
If a matching CommittedResource has unused capacity, the request is accepted.
Otherwise, PAYG headroom is checked for ram, cores, and instances.
Rejects all hosts if neither tier has headroom.
When dryRun is true the filter runs in shadow mode: it logs and emits
the cortex_nova_filter_quota_enforcement_decisions_total metric for
would-be rejects but never actually removes hosts.
params:
- {key: dryRun, boolValue: true}

Runs in shadow mode by default: the filter logs and emits the
cortex_nova_filter_quota_enforcement_decisions_total metric for
would-be rejects but never actually removes hosts. To globally reject
requests without headroom, add `params: [{key: enforce, boolValue: true}]`
to this step. The shadow default is intentional so that newly-rolled-out
releases never silently start rejecting requests.
weighers:
- name: kvm_prefer_smaller_hosts
params:
Expand Down Expand Up @@ -735,11 +739,13 @@ spec:
If a matching CommittedResource has unused capacity, the request is accepted.
Otherwise, PAYG headroom is checked for ram, cores, and instances.
Rejects all hosts if neither tier has headroom.
When dryRun is true the filter runs in shadow mode: it logs and emits
the cortex_nova_filter_quota_enforcement_decisions_total metric for
would-be rejects but never actually removes hosts.
params:
- {key: dryRun, boolValue: true}

Runs in shadow mode by default: the filter logs and emits the
cortex_nova_filter_quota_enforcement_decisions_total metric for
would-be rejects but never actually removes hosts. To globally reject
requests without headroom, add `params: [{key: enforce, boolValue: true}]`
to this step. The shadow default is intentional so that newly-rolled-out
releases never silently start rejecting requests.
weighers:
- name: kvm_prefer_smaller_hosts
params:
Expand Down Expand Up @@ -885,11 +891,13 @@ spec:
If a matching CommittedResource has unused capacity, the request is accepted.
Otherwise, PAYG headroom is checked for ram, cores, and instances.
Rejects all hosts if neither tier has headroom.
When dryRun is true the filter runs in shadow mode: it logs and emits
the cortex_nova_filter_quota_enforcement_decisions_total metric for
would-be rejects but never actually removes hosts.
params:
- {key: dryRun, boolValue: true}

Runs in shadow mode by default: the filter logs and emits the
cortex_nova_filter_quota_enforcement_decisions_total metric for
would-be rejects but never actually removes hosts. To globally reject
requests without headroom, add `params: [{key: enforce, boolValue: true}]`
to this step. The shadow default is intentional so that newly-rolled-out
releases never silently start rejecting requests.
weighers:
- name: kvm_prefer_smaller_hosts
params:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,15 @@ import (
)

type FilterQuotaEnforcementOpts struct {
// DryRun, when true, makes the filter run in shadow mode: it performs the
// full headroom analysis and logs/emits metrics for would-be rejects, but
// never actually removes hosts from the result. Use this for safe rollouts
// and to observe cortex_nova_filter_quota_enforcement_decisions_total in
// shadow mode before flipping to enforce.
DryRun bool `json:"dryRun,omitempty"`
// Enforce, when false (the default zero value), makes the filter run in
// shadow mode: it performs the full headroom analysis and logs/emits
// metrics for would-be rejects, but never actually removes hosts from the
// result. Set to true to globally reject requests without headroom. The
// shadow default is intentional so that operators can observe
// cortex_nova_filter_quota_enforcement_decisions_total before opting in to
// enforcement, and so that newly-rolled-out pipelines never silently start
// rejecting requests.
Enforce bool `json:"enforce,omitempty"`
}

func (FilterQuotaEnforcementOpts) Validate() error { return nil }
Expand Down Expand Up @@ -52,11 +55,13 @@ func (FilterQuotaEnforcementOpts) Validate() error { return nil }
// which causes the pipeline to skip it (fail-open).
//
// Two modes:
// - DryRun=false (enforce, default zero value): on a no-headroom decision the filter
// clears all host activations to globally reject the request.
// - DryRun=true (shadow): the filter performs the same analysis, logs the would-be
// reject, and emits the same decision metric with mode="shadow" — but does NOT
// remove hosts. Use this to observe rejection volumes before enabling enforcement.
// - Enforce=false (shadow, default zero value): the filter performs the full
// headroom analysis, logs the would-be reject, and emits the same decision
// metric with mode="shadow" — but does NOT remove hosts. This is the
// default so that newly-rolled-out pipelines never silently start
// rejecting requests.
// - Enforce=true: on a no-headroom decision the filter clears all host
// activations to globally reject the request.
Comment on lines 57 to +64
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align the top-level behavior comment with the new default-shadow semantics.

Line 51 still says no-headroom always removes all hosts, which conflicts with the updated Enforce=false shadow behavior. Please make that sentence conditional by mode.

Suggested doc fix
-// If neither tier has headroom, ALL hosts are removed from the result (global reject).
+// If neither tier has headroom:
+//   - Enforce=true: ALL hosts are removed from the result (global reject).
+//   - Enforce=false: runs in shadow mode (records would-be reject, keeps hosts).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/scheduling/nova/plugins/filters/filter_quota_enforcement.go` around
lines 57 - 64, Update the top-level behavior comment in
filter_quota_enforcement.go to reflect the new default shadow semantics: make
the sentence that currently states "no-headroom always removes all hosts"
conditional on the Enforce mode (i.e., when Enforce=true the filter clears all
host activations on a no-headroom decision, whereas when Enforce=false it only
logs the would-be reject and emits metrics without removing hosts). Reference
the Enforce flag/field in the comment and phrase the behavior explicitly for
both modes so the comment aligns with the implemented shadow/default behavior.

//
// Every accept/reject/skip outcome is recorded as a Prometheus counter:
// cortex_nova_filter_quota_enforcement_decisions_total{mode,decision,resource,
Expand All @@ -70,9 +75,9 @@ type FilterQuotaEnforcement struct {
func (s *FilterQuotaEnforcement) Run(traceLog *slog.Logger, request api.ExternalSchedulerRequest) (*lib.FilterWeigherPipelineStepResult, error) {
result := s.IncludeAllHostsFromRequest(request)

mode := "enforce"
if s.Options.DryRun {
mode = "shadow"
mode := "shadow"
if s.Options.Enforce {
mode = "enforce"
}

// Step 1: Skip intents that don't represent new resource consumption.
Expand Down Expand Up @@ -280,8 +285,8 @@ func (s *FilterQuotaEnforcement) Run(traceLog *slog.Logger, request api.External

if headroom < check.request {
QuotaEnforcementMetricsSingleton.RecordDecision(mode, "reject", check.label, az, hwVersion)
if s.Options.DryRun {
traceLog.Info("quota enforcement SHADOW: would reject but dryRun=true",
if !s.Options.Enforce {
traceLog.Info("quota enforcement SHADOW: would reject but enforce=false",
"projectID", projectID,
"az", az,
"flavorGroup", hwVersion,
Expand Down
Loading