Skip to content

Infer DSE param encoding (categorical/ordinal/log) at parse time in EnvParamSpec #956

Description

@rutayan-nv

Summary

DSE parameter encoding should be inferred from the candidate values at parse time in the framework (EnvParamSpec / EnvParams), instead of being decided ad hoc by each optimizer agent. Today the encoding.type field must be spelled out in TOML, and the only real inference that exists (log-scale detection) lives inside a single BO agent rather than in the framework.

Motivation / problem

  • src/cloudai/configurator/env_params.py defines the encoding stack (Encoding protocol, CategoricalEncoding, LogEncoding, AnyEncoding discriminated union). Selecting a non-default encoding currently requires an explicit encoding = { type = "log" } in the config. Config authors have no way to know which type to fill, and it is easy to forget/mismatch.
  • The one place that does infer parameter kind from values is an optimizer agent (BO's _detect_log_scale, which flips Ax's log_scale flag). That is framework logic that leaked into an agent:
    • It is agent-specific and not reused by GA / MAB / RL agents, which each re-parse the raw config in their own configure().
    • The taxonomy is generic ("given candidate values, pick an encoding") with nothing optimizer-specific about it, so every agent re-deriving it is duplication and a source of drift.

Encoding is a property of the parameter, not of the optimizer. It should be decided once, at the layer that already owns the parameter model.

Proposal

Infer the Encoding from candidate values when constructing EnvParamSpec / EnvParam, and make encoding.type an optional override:

Inference rules (mirroring the existing BO heuristic, generalized):

  • all candidates are strings -> CategoricalEncoding
  • numeric, length >= 3, all strictly positive, constant ratio within tolerance (geometric series) -> LogEncoding
  • otherwise numeric -> ordinal/linear (categorical-by-index today; a dedicated ordinal/scalar encoding can follow)

Precedence: an explicit encoding in TOML always wins; inference only fills the unspecified case.

Edge cases

  • drop_rate = [0.0, 0.001] (real PRT case) cannot be inferred as log: it has only 2 points and contains 0.0. It will infer to ordinal/linear. Authors who want log for such a series must either use a genuine >=3-point positive geometric series or set the explicit override. This is expected and should be documented.
  • Zero / negative values disqualify log inference.
  • Perfectly uniform diffs (arithmetic) -> not log.

Acceptance criteria

  • Inference happens in env_params.py at parse time (EnvParamSpec / EnvParams.from_test), not in any agent.
  • encoding.type is optional; an explicit value overrides inference.
  • Unit tests cover each branch (strings, geometric->log, arithmetic->linear) plus the ambiguous/zero/2-point cases and the explicit-override precedence.
  • Follow-up (separate, downstream): optimizer agents drop their ad-hoc inference and consume the framework-decided Encoding.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions