Add the unified benchmark submodule (`python -m bench`)

> Per the 0.4.0 version description: *"Unified benchmark framework at `python -m bench` (subcommands `run`, `compare`, `plot`, `summary`, `list`) with generator-driven probes in `bench/_operations.py`, peak-memory recording in `bench/harness.py:measure_peak_memory`, fixed seeds `(0, 1, 2, 3)`, and a self-contained interactive Plotly dashboard at `bench/_dashboard.py`."*

## File to touch

```text
bench/__main__.py
bench/_operations.py
bench/harness.py
bench/_probes.py
bench/_seeds.py
bench/_run.py
bench/_io.py
bench/_dashboard.py
```

## Example to follow

```text
tests/generators/        # generator-driven, per-seed/per-backend problem construction
tests/backend/_conformance.py   # per-op harness + backend matrix layout
```

## Task

Add a top-level `bench` package runnable as `python -m bench` exposing the
subcommands `run`, `compare`, `plot`, `summary`, and `list`.

- Define the probe abstraction in `bench/_probes.py` (`Probe`, `ProbeCase`, a
  `registry`) so each probe builds a *fresh* problem on a per-seed, per-backend
  basis and declares its supported backends.
- Populate `bench/_operations.py` with generator-driven probes spanning the same
  surface as the test suite (space ops, LinOp `apply`/`rapply`/`vapply`, dense /
  diagonal / sparse / identity / composed / summed). Importing the module
  registers the probes.
- Pin the seed quartet `(0, 1, 2, 3)` in `bench/_seeds.py`; every run iterates
  exactly these seeds.
- Implement `measure_peak_memory(fn)` in `bench/harness.py` returning peak
  allocation for a zero-arg callable.
- Wire `bench/__main__.py` to filter probes by `--family`/`--match`, run them
  over the requested backends/devices, and emit a JSON artifact (`bench/_io.py`)
  plus the verdict text; `plot` renders the self-contained interactive Plotly
  dashboard in `bench/_dashboard.py`.

## Done condition

The bench smoke suite passes, confirming the registry is populated, names are
unique, the seed quartet is the documented one, and each probe factory builds a
runnable NumPy case.

```bash
pytest tests/bench/test_bench_smoke.py -q
```

## Mathematical note

Benchmarks measure performance, not correctness, so there is no numerical
invariant to hold. The only reproducibility invariant is determinism: a probe at
a fixed `(seed, backend, size)` must construct an identical problem instance on
every run — the seed quartet `(0, 1, 2, 3)` is the sole entropy source, and no
probe may read wall-clock time or unseeded RNG into the problem it builds.

## Prerequisite level

* Basic NumPy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the unified benchmark submodule (`python -m bench`) #47

File to touch

Example to follow

Task

Done condition

Mathematical note

Prerequisite level

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add the unified benchmark submodule (python -m bench) #47

Description

File to touch

Example to follow

Task

Done condition

Mathematical note

Prerequisite level

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Add the unified benchmark submodule (`python -m bench`) #47