Skip to content

Docs: brute-force vs DC-EGM comparison, example DC-EGM variant, CHANGES#382

Open
hmgaudecker wants to merge 31 commits into
feat/dcegm-simulatefrom
docs/dcegm-example
Open

Docs: brute-force vs DC-EGM comparison, example DC-EGM variant, CHANGES#382
hmgaudecker wants to merge 31 commits into
feat/dcegm-simulatefrom
docs/dcegm-example

Conversation

@hmgaudecker

Copy link
Copy Markdown
Member

Fifth DC-EGM stack PR (stacked on #380, sibling of #381): the docs and example finalization.

  • lcm_examples.iskhakov_et_al_2017 gains the DC-EGM variant of the retirement model — get_dcegm_model(n_periods) with the declared resources / savings / inverse_marginal_utility functions, the savings-based wealth transition, no borrowing constraint (enforced intrinsically), and a cubically clustered savings grid. get_params works unchanged; the two builders are mathematically equivalent specs.
  • The explanation notebook's "Brute force vs DC-EGM" placeholder becomes a real comparison: period-0 value-function error of the default-grid brute-force solve vs DC-EGM, both measured against a fine-grid (500 × 2500) brute-force reference on a log scale, plus per-solve timings. The text states honestly that the reference's own grid error bounds the comparison, and that DC-EGM's edge is sharpest where the borrowing constraint binds (closed-form constrained segment vs grid quantization). The v1 simulate caveat (DC-EGM-solved models raise NotImplementedError on simulate()) is documented where readers will look for policy figures.
  • CHANGES.md documents the DC-EGM solver, regime-level EV1 taste shocks, and the example under an Unreleased heading (version number left to the maintainers).

Verified: full suite 1172 passed / 46 skipped, ty clean, prek clean (notebook cells stored as line arrays, outputs stripped). The notebook's new cells were smoke-run as scripts at reduced horizon; the full notebook executes on the RTD preview build of this PR.

🤖 Generated with Claude Code

…notebook.

The example module gains the DC-EGM variant of the model (get_dcegm_model:
declared resources/savings/inverse_marginal_utility, savings-based wealth
transition, no borrowing constraint, cubically clustered savings grid); the
explanation notebook replaces the placeholder with an accuracy comparison
against a fine-grid brute-force reference plus per-solve timings; CHANGES.md
documents the DC-EGM solver, regime-level taste shocks, and the example.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@read-the-docs-community

read-the-docs-community Bot commented Jun 10, 2026

Copy link
Copy Markdown

@hmgaudecker hmgaudecker changed the base branch from feat/dcegm-discrete-choices to feat/dcegm-simulate June 11, 2026 06:32
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

Benchmark comparison (main → HEAD)

Comparing 64cf042c (main) → 0b5c299b (HEAD)

Benchmark Statistic before after Ratio Alert
aca-baseline execution time 15.286 s 14.117 s 0.92
peak GPU mem 581 MB 581 MB 1.00
compilation time 311.45 s 338.81 s 1.09
peak CPU mem 7.83 GB 6.85 GB 0.87
aca-baseline-debug execution time 72.232 s 79.775 s 1.10
peak GPU mem 581 MB 581 MB 1.00
compilation time 404.14 s 448.30 s 1.11
peak CPU mem 7.63 GB 7.90 GB 1.04
Mahler-Yum execution time 4.868 s 4.522 s 0.93
peak GPU mem 520 MB 520 MB 1.00
compilation time 11.92 s 11.26 s 0.94
peak CPU mem 1.60 GB 1.58 GB 0.98
Precautionary Savings - Solve execution time 27.6 ms 23.4 ms 0.85
peak GPU mem 8 MB 8 MB 1.00
compilation time 1.52 s 1.65 s 1.09
peak CPU mem 1.15 GB 1.16 GB 1.00
Precautionary Savings - Simulate execution time 98.4 ms 64.2 ms 0.65
peak GPU mem 162 MB 157 MB 0.97
compilation time 3.83 s 3.57 s 0.93
peak CPU mem 1.35 GB 1.33 GB 0.99
Precautionary Savings - Solve & Simulate execution time 124.7 ms 87.5 ms 0.70
peak GPU mem 566 MB 566 MB 1.00
compilation time 5.07 s 4.64 s 0.92
peak CPU mem 1.32 GB 1.31 GB 0.99
Precautionary Savings - Solve & Simulate (irreg) execution time 229.2 ms 200.5 ms 0.87
peak GPU mem 2.18 GB 2.18 GB 1.00
compilation time 5.24 s 5.05 s 0.96
peak CPU mem 1.37 GB 1.37 GB 1.00
IskhakovEtAl2017DCEGMSimulate execution time 200.5 ms
compilation time 4.36 s
peak CPU mem 1.59 GB
IskhakovEtAl2017DCEGMSolve execution time 139.5 ms
compilation time 4.36 s
peak CPU mem 1.48 GB
IskhakovEtAl2017Simulate execution time 196.3 ms
compilation time 4.20 s
peak CPU mem 1.29 GB
IskhakovEtAl2017Solve execution time 48.7 ms
compilation time 0.66 s
peak CPU mem 1.15 GB
IskhakovEtAl2017DCEGMSimulateGpuPeakMem peak GPU mem 282 MB
IskhakovEtAl2017DCEGMSolveGpuPeakMem peak GPU mem 0 MB
IskhakovEtAl2017SimulateGpuPeakMem peak GPU mem 281 MB
IskhakovEtAl2017SolveGpuPeakMem peak GPU mem 67 MB

hmgaudecker and others added 19 commits June 12, 2026 05:35
The solver-comparison fixtures import resources, savings,
next_wealth_from_savings, and inverse_marginal_utility from
lcm_examples.iskhakov_et_al_2017 instead of redefining them. The
explanation notebook's intro and the explanations index describe the
brute-force vs DC-EGM comparison the notebook actually contains.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
hmgaudecker and others added 6 commits June 13, 2026 20:23
The dominant accelerator resident in a DC-EGM solve is the rolling
`next_regime_to_egm_carry` — a dense-endogenous-grid, per-discrete-action
carry held for *every* carry-producing regime at once. Confirmed
n_assets-independent (~63 GB on the ACA benchmark at n_assets=3/4/8; the value
function, which scales with the Euler/n_assets axis, is negligible by
comparison). Two complementary changes shrink it:

- Reachable-target carry filtering (unconditional): each `egm_step` only ever
  indexes its reachable targets, so it is now handed just that subset
  (`_reachable_carry_subset`) instead of the full all-regimes mapping.
  `reachable_targets` is threaded `build_egm_step_functions` -> `SolverKernels`
  -> `SolutionPhase`. Correctness-preserving (smaller kernel signature).

- `offload_carries_to_host` on `Model.solve` / `Model.simulate` (default off):
  evict the rolled carries to host memory between periods (inside
  `_roll_continuation_inputs`), so the device holds only the reachable subset
  each kernel re-uploads at dispatch rather than every regime's carry at once.
  Trades per-period host round-trips for a large drop in peak device memory;
  a no-op on a CPU-only host.

Value functions are bit-identical with and without the flag (new parity tests
in `tests/solution/test_egm_carry_offload.py`, including the ACA-shaped
passive-AIME + asset-row config). Full suite green at float64.

Also renames `EgmCarry` -> `EGMCarry` (and `EgmCarryProducer`,
`EgmStepFunction`) for acronym casing; mechanical, spans the same files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Under a distributed (sharded) solve the rolling carries are device-sharded.
`offload_carries_to_host` moved them with `jax.device_put(..., cpu_device)`,
which produces a CPU-*committed* array; re-uploading that to an AOT-compiled
egm_step rejects it — "input shardings disagree with the shardings of
arguments" — because a committed array must match the compiled input sharding.

Use `jax.device_get` (host NumPy) instead. Host NumPy is *uncommitted*, so JAX
reshards it to the kernel's compiled `in_shardings` at dispatch rather than
erroring. Frees the same device memory; a no-op on a CPU-only host. Parity
tests unchanged (single-device, so device_get round-trips identically).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The host-offload (`offload_carries_to_host`) copied the rolled carries to host
with `jax.device_get` but never freed the device originals, so the previous
period's carry set lingered (GC lag / async dispatch) alongside the next
period's — two periods co-resident, and the peak never dropped (measured
identical to no-offload at ~63 GB).

After `device_get`, delete this period's freshly produced kernel-output carries
(`period_egm_carries`) — and only those. The carried-forward entries are either
already host (a prior offload) or the regimes' shared `egm_carry_template`,
which must not be deleted: it is reused next period for inactive regimes and
across solves, and a blanket free corrupts the cached model (observed as
"deleted array" errors when a `@cache`d model is solved twice).

With only one period's carries resident, the peak should roughly halve. Parity
unchanged (the host copies carry the data forward); full EGM suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant