Skip to content

heygen-com/auto-model-optim

Repository files navigation

auto_model_optim

An agent-driven loop that optimizes a model's inference latency on a specific GPU without degrading output quality, and proves it. You point Claude Code at it, set a /goal, and walk away.

Worked example: the Wan 2.2 VAE decoder on an H100, fp32 14.47s to 4.79s (3.02x), found blind in 18 experiments with quality held inside a frozen gate.

Writeup: https://www.heygen.com/research/auto-optim-ai-model-inference-speedup

How it works

  • The agent edits exactly one file, optimize.py (build_decoder()).
  • A frozen evaluator it cannot touch defines success: harness/verify.py (output must match a frozen fp32 reference) plus harness/bench.py (latency).
  • harness/run_experiment.py verifies, benchmarks, then commits improvements and reverts regressions. The git history is the research record, and every kept result auto-runs a profiler so the next idea comes from data, not a guess.
  • Durable memory: results.tsv, leaderboard.md, PROGRESS.md (journal + dead-ends).

Setup

huggingface-cli download Wan-AI/Wan2.2-TI2V-5B Wan2.2_VAE.pth --local-dir ./assets
export CUDA_VISIBLE_DEVICES=<free_gpu>
python harness/make_latent.py       # downloads a CC-BY clip, encodes the fixed input latent
python harness/make_reference.py    # fp32 decode of it = the frozen quality target
python harness/run_experiment.py --desc "baseline fp32 eager"

Run it with Claude Code

This repo is wired for Claude Code. CLAUDE.md (a symlink to AGENTS.md) is the always-loaded contract; program.md is the full runbook, loaded on demand; a scoped model-optimizer subagent does the edits; a Stop hook refuses to end a turn until the experiment is logged and the gate is green.

Open the repo in Claude Code and hand it the wheel with a /goal:

/goal Drive Wan 2.2 VAE decode latency as low as it goes while harness/verify.py keeps
      passing. Follow program.md: edit only optimize.py, run harness/run_experiment.py
      after every change, and make each experiment attack what the latest profile shows.
      Stop only when the best leaderboard.md time has not improved for 8 consecutive
      profile-guided attempts AND the profile shows no bottleneck left worth attacking.

/goal re-checks that completion condition every turn, so the loop runs unattended until it hits a real plateau rather than an arbitrary experiment count. Watch it climb in leaderboard.md and PROGRESS.md. If it starts nibbling tiny gains, do not hand it the answer: sharpen the rule in program.md (we added an anti-nibbling rule mid-run and the plateau broke). Program the org, not the Python.

Retarget to your own model

The fastest way is to let the agent do the wiring: open the repo in Claude Code and ask it to retarget to your model. Nothing here is hardcoded to a VAE; the harness is generic and the model-optimizer subagent is model-agnostic (the target lives in goals.json). It will drop your model and a fixed input into model/ and assets/, return your forward pass from optimize.py, and write goals.json for you. Then you set the /goal and it runs the same loop.

By hand it is three swaps; everything else stays as is:

  1. model/ + assets/: your model, checkpoint, and one fixed input.
  2. optimize.py::build_decoder: return your (forward_fn, info).
  3. goals.json (metric, thresholds, stop condition), plus the comparison in harness/_common.py if your quality metric differs.

Pin a random seed first, or run-to-run noise will read as a speedup.

Notes

  • The shipped optimize.py is the winning config, tuned for a single H100. On other hardware treat it as a starting point and rerun the loop.
  • Benchmark input is Tears of Steel (c) Blender Foundation, mango.blender.org, CC-BY 3.0. No proprietary data is used.
  • Heavy assets (*.pth, *.pt, *.ckpt, downloaded video) are gitignored and generated locally by the setup steps above.

Layout

optimize.py    the only file the agent edits        goals.json   frozen target + gate
AGENTS.md      agent contract                        program.md   loop runbook
harness/       frozen evaluator + driver + profiler  .claude/agents/model-optimizer.md
model/         frozen model definition               assets/      weights, latent, reference
results.tsv    leaderboard.md   PROGRESS.md          the durable record

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages