auto_model_optim

An agent-driven loop that optimizes a model's inference latency on a specific GPU without degrading output quality, and proves it. You point Claude Code at it, set a /goal, and walk away.

Worked example: the Wan 2.2 VAE decoder on an H100, fp32 14.47s to 4.79s (3.02x), found blind in 18 experiments with quality held inside a frozen gate.

Writeup: https://www.heygen.com/research/auto-optim-ai-model-inference-speedup

How it works

The agent edits exactly one file, optimize.py (build_decoder()).
A frozen evaluator it cannot touch defines success: harness/verify.py (output must match a frozen fp32 reference) plus harness/bench.py (latency).
harness/run_experiment.py verifies, benchmarks, then commits improvements and reverts regressions. The git history is the research record, and every kept result auto-runs a profiler so the next idea comes from data, not a guess.
Durable memory: results.tsv, leaderboard.md, PROGRESS.md (journal + dead-ends).

Setup

huggingface-cli download Wan-AI/Wan2.2-TI2V-5B Wan2.2_VAE.pth --local-dir ./assets
export CUDA_VISIBLE_DEVICES=<free_gpu>
python harness/make_latent.py       # downloads a CC-BY clip, encodes the fixed input latent
python harness/make_reference.py    # fp32 decode of it = the frozen quality target
python harness/run_experiment.py --desc "baseline fp32 eager"

Run it with Claude Code

This repo is wired for Claude Code. CLAUDE.md (a symlink to AGENTS.md) is the always-loaded contract; program.md is the full runbook, loaded on demand; a scoped model-optimizer subagent does the edits; a Stop hook refuses to end a turn until the experiment is logged and the gate is green.

Open the repo in Claude Code and hand it the wheel with a /goal:

/goal Drive Wan 2.2 VAE decode latency as low as it goes while harness/verify.py keeps
      passing. Follow program.md: edit only optimize.py, run harness/run_experiment.py
      after every change, and make each experiment attack what the latest profile shows.
      Stop only when the best leaderboard.md time has not improved for 8 consecutive
      profile-guided attempts AND the profile shows no bottleneck left worth attacking.

/goal re-checks that completion condition every turn, so the loop runs unattended until it hits a real plateau rather than an arbitrary experiment count. Watch it climb in leaderboard.md and PROGRESS.md. If it starts nibbling tiny gains, do not hand it the answer: sharpen the rule in program.md (we added an anti-nibbling rule mid-run and the plateau broke). Program the org, not the Python.

Retarget to your own model

The fastest way is to let the agent do the wiring: open the repo in Claude Code and ask it to retarget to your model. Nothing here is hardcoded to a VAE; the harness is generic and the model-optimizer subagent is model-agnostic (the target lives in goals.json). It will drop your model and a fixed input into model/ and assets/, return your forward pass from optimize.py, and write goals.json for you. Then you set the /goal and it runs the same loop.

By hand it is three swaps; everything else stays as is:

model/ + assets/: your model, checkpoint, and one fixed input.
optimize.py::build_decoder: return your (forward_fn, info).
goals.json (metric, thresholds, stop condition), plus the comparison in harness/_common.py if your quality metric differs.

Pin a random seed first, or run-to-run noise will read as a speedup.

Notes

The shipped optimize.py is the winning config, tuned for a single H100. On other hardware treat it as a starting point and rerun the loop.
Benchmark input is Tears of Steel (c) Blender Foundation, mango.blender.org, CC-BY 3.0. No proprietary data is used.
Heavy assets (*.pth, *.pt, *.ckpt, downloaded video) are gitignored and generated locally by the setup steps above.

Layout

optimize.py    the only file the agent edits        goals.json   frozen target + gate
AGENTS.md      agent contract                        program.md   loop runbook
harness/       frozen evaluator + driver + profiler  .claude/agents/model-optimizer.md
model/         frozen model definition               assets/      weights, latent, reference
results.tsv    leaderboard.md   PROGRESS.md          the durable record

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

auto_model_optim

How it works

Setup

Run it with Claude Code

Retarget to your own model

Notes

Layout

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude/agents		.claude/agents
harness		harness
model		model
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
PROGRESS.md		PROGRESS.md
README.md		README.md
goals.json		goals.json
leaderboard.md		leaderboard.md
optimize.py		optimize.py
program.md		program.md
results.tsv		results.tsv

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

auto_model_optim

How it works

Setup

Run it with Claude Code

Retarget to your own model

Notes

Layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages