Skip to content

Add the unified benchmark submodule (python -m bench) #47

Description

@Pavlo3P

Per the 0.4.0 version description: "Unified benchmark framework at python -m bench (subcommands run, compare, plot, summary, list) with generator-driven probes in bench/_operations.py, peak-memory recording in bench/harness.py:measure_peak_memory, fixed seeds (0, 1, 2, 3), and a self-contained interactive Plotly dashboard at bench/_dashboard.py."

File to touch

bench/__main__.py
bench/_operations.py
bench/harness.py
bench/_probes.py
bench/_seeds.py
bench/_run.py
bench/_io.py
bench/_dashboard.py

Example to follow

tests/generators/        # generator-driven, per-seed/per-backend problem construction
tests/backend/_conformance.py   # per-op harness + backend matrix layout

Task

Add a top-level bench package runnable as python -m bench exposing the
subcommands run, compare, plot, summary, and list.

  • Define the probe abstraction in bench/_probes.py (Probe, ProbeCase, a
    registry) so each probe builds a fresh problem on a per-seed, per-backend
    basis and declares its supported backends.
  • Populate bench/_operations.py with generator-driven probes spanning the same
    surface as the test suite (space ops, LinOp apply/rapply/vapply, dense /
    diagonal / sparse / identity / composed / summed). Importing the module
    registers the probes.
  • Pin the seed quartet (0, 1, 2, 3) in bench/_seeds.py; every run iterates
    exactly these seeds.
  • Implement measure_peak_memory(fn) in bench/harness.py returning peak
    allocation for a zero-arg callable.
  • Wire bench/__main__.py to filter probes by --family/--match, run them
    over the requested backends/devices, and emit a JSON artifact (bench/_io.py)
    plus the verdict text; plot renders the self-contained interactive Plotly
    dashboard in bench/_dashboard.py.

Done condition

The bench smoke suite passes, confirming the registry is populated, names are
unique, the seed quartet is the documented one, and each probe factory builds a
runnable NumPy case.

pytest tests/bench/test_bench_smoke.py -q

Mathematical note

Benchmarks measure performance, not correctness, so there is no numerical
invariant to hold. The only reproducibility invariant is determinism: a probe at
a fixed (seed, backend, size) must construct an identical problem instance on
every run — the seed quartet (0, 1, 2, 3) is the sole entropy source, and no
probe may read wall-clock time or unseeded RNG into the problem it builds.

Prerequisite level

  • Basic NumPy

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions