Skip to content

refactor[next]: Split compilation into a actual compilation and loading of the compilation artifact#2587

Open
havogt wants to merge 40 commits into
GridTools:mainfrom
havogt:worktree-otf-build-finalize-split
Open

refactor[next]: Split compilation into a actual compilation and loading of the compilation artifact#2587
havogt wants to merge 40 commits into
GridTools:mainfrom
havogt:worktree-otf-build-finalize-split

Conversation

@havogt

@havogt havogt commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

This is a preparation to allow compilation in separate processes for faster jit compilation.

The new workflow: CompilationStep produces a CompilationArtifact which is picklable and has a load() method which returns the original ExecutableProgram.

@havogt havogt requested a review from Copilot April 27, 2026 20:14

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Prepares the “next” compilation pipeline for out-of-process / multi-process JIT by splitting “compile” into producing a (mostly) picklable compilation artifact and a separate load() step that materializes the in-process executable callable.

Changes:

  • Introduce stages.CompilationArtifact protocol and update OTF compilation contracts to return artifacts with load().
  • Refactor CPP (nanobind/GTFN) and DaCe compilation steps to return artifact dataclasses instead of live callables; move “decoration” into artifact.load().
  • Update Backend.compile() and tests to materialize executables via artifact.load() and add pickle round-trip contract tests.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/next_tests/unit_tests/program_processor_tests/runners_tests/test_gtfn.py Updates assertions to reflect compilation.device_type living on the compilation step.
tests/next_tests/unit_tests/program_processor_tests/runners_tests/dace_tests/test_dace_compilation.py Adds pickle round-trip contract test for DaCe artifact (ensuring live handles don’t serialize).
tests/next_tests/unit_tests/otf_tests/test_compiled_program.py Adapts a hijacking test to return a dummy artifact with load() instead of a raw callable.
tests/next_tests/unit_tests/otf_tests/compilation_tests/test_compiler.py Adds pickle round-trip contract test for CPP artifact.
tests/next_tests/integration_tests/feature_tests/otf_tests/test_nanobind_build.py Updates integration flow to call .load() on the returned compilation artifact.
src/gt4py/next/program_processors/runners/roundtrip.py Wraps roundtrip’s in-memory callable in a RoundtripArtifact implementing load().
src/gt4py/next/program_processors/runners/gtfn.py Introduces GTFNCompilationArtifact/GTFNCompiler to stamp device type and apply argument conversion in load().
src/gt4py/next/program_processors/runners/dace/workflow/factory.py Removes separate decoration stage wiring (now handled by artifact load()).
src/gt4py/next/program_processors/runners/dace/workflow/decoration.py Adjusts imports/types to avoid cycles after moving decoration into artifact load().
src/gt4py/next/program_processors/runners/dace/workflow/compilation.py Adds DaCeCompilationArtifact and makes DaCe compilation return an on-disk artifact with load().
src/gt4py/next/otf/stages.py Adds CompilationArtifact protocol.
src/gt4py/next/otf/recipes.py Updates OTFCompileWorkflow typing to end at CompilationArtifact (removes decoration step).
src/gt4py/next/otf/definitions.py Updates CompilationStep protocol to return CompilationArtifact.
src/gt4py/next/otf/compilation/compiler.py Refactors CPP compilation into CPPCompiler + CPPCompilationArtifact(load).
src/gt4py/next/backend.py Changes backend executor to return artifacts; compile() now calls artifact.load().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/gt4py/next/otf/stages.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/workflow/compilation.py Outdated
Comment thread tests/next_tests/unit_tests/otf_tests/compilation_tests/test_compiler.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the “next” OTF compilation workflow so that compilation steps produce a picklable CompilationArtifact with a load() method, enabling future compilation in separate processes while keeping load/materialization process-local.

Changes:

  • Introduces stages.CompilationArtifact and updates CompilationStep, OTFCompileWorkflow, and Backend.compile() to compile → return artifact → load() into an ExecutableProgram.
  • Adds/updates concrete artifacts and compilers (CPP/GTFN/DaCe/Roundtrip) so backend-specific loading/wrapping happens in artifact.load() rather than a separate decoration stage.
  • Updates tests and adds minimal “pickle round-trip” contract tests for the new artifact types.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/next_tests/unit_tests/program_processor_tests/runners_tests/test_gtfn.py Updates assertions to reflect device type living on the compilation step/artifact.
tests/next_tests/unit_tests/program_processor_tests/runners_tests/dace_tests/test_dace_compilation.py Adds pickle round-trip contract test for DaCeCompilationArtifact (ensures live handle is dropped).
tests/next_tests/unit_tests/otf_tests/test_compiled_program.py Adjusts test to use a dummy artifact (load() returns a no-op callable).
tests/next_tests/unit_tests/otf_tests/compilation_tests/test_compiler.py Adds pickle round-trip contract test for CPPCompilationArtifact.
tests/next_tests/integration_tests/feature_tests/otf_tests/test_nanobind_build.py Updates integration tests to build → artifact → .load() and passes device_type.
src/gt4py/next/program_processors/runners/roundtrip.py Splits roundtrip into source generation + module load, introduces RoundtripArtifact.load().
src/gt4py/next/program_processors/runners/gtfn.py Introduces GTFNCompilationArtifact/GTFNCompiler so wrapping happens on load().
src/gt4py/next/program_processors/runners/dace/workflow/factory.py Removes decoration stage wiring; compilation now returns an artifact.
src/gt4py/next/program_processors/runners/dace/workflow/decoration.py Adjusts imports/types to avoid cycles after moving decoration to artifact load.
src/gt4py/next/program_processors/runners/dace/workflow/compilation.py Introduces DaCeCompilationArtifact and updates compiler to return it (with optional process-local live-program cache).
src/gt4py/next/otf/stages.py Adds CompilationArtifact protocol.
src/gt4py/next/otf/recipes.py Updates OTFCompileWorkflow typing to end at CompilationArtifact (removes decoration stage).
src/gt4py/next/otf/definitions.py Updates CompilationStep protocol to return CompilationArtifact.
src/gt4py/next/otf/compilation/compiler.py Refactors CPP compiler to return CPPCompilationArtifact and adds load().
src/gt4py/next/backend.py Updates Backend.compile() to call artifact.load() before returning an executable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/gt4py/next/otf/stages.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/workflow/compilation.py Outdated
Comment thread src/gt4py/next/program_processors/runners/roundtrip.py
havogt and others added 4 commits June 19, 2026 12:15
Brings PR 2587 (compilation/load artifact split) up to date with the
GridTools#2648 fingerprinting refactor.

Conflict resolutions:
- dace/workflow/compilation.py: keep both — upstream's GPU TX-marker pass and
  this branch's str(build_folder) (build_folder is now a pathlib.Path here).
- dace/workflow/factory.py: keep the `cache` import (used by the cached_translation
  trait); drop the now-unused `decoration` step/import (decoration moved into
  the artifact's load()).
- roundtrip.py: keep RoundtripArtifact return; upstream's inline decorated_fencil
  is the pre-split pattern this PR replaces.
- test_dace_compilation.py: combine both test sets (TX-marker + artifact pickle).
- next/utils.py: restore MetadataBasedPickling (silently dropped by the textual
  merge — GridTools#2648 removed it as dead code, but this PR's artifacts depend on it).
  Orthogonal to GridTools#2648's fingerprint opt-out; both share the gt4py_metadata namespace.
Move the in-process CompiledDaceProgram cache from a `pickle=False`
field on `DaCeCompilationArtifact` to a module-level dict keyed by
build folder. The artifact becomes a plain frozen dataclass with no
live process-bound state, restoring the strong `CompilationArtifact`
Protocol contract; receivers in another process see an empty cache and
fall through to the disk-based `get_program_handle` load path as
before.

`_live_program` was the only `pickle=False` field in the tree, so the
`MetadataBasedPickling` mixin + `_get_metadata_based_state_getstate`
helper become dead code and are deleted (re-doing what GridTools#2648 had
removed; the merge re-introduced them only to serve this one user).
`CPPCompilationArtifact` drops the now-no-op `MetadataBasedPickling`
parent. `gt4py_metadata` itself stays, still used for the orthogonal
`fingerprint=False` axis introduced in GridTools#2648.

The Protocol docstring is restored to the strong form and explicitly
flags `RoundtripArtifact.dispatch_backend` as the one remaining
deviation, to be lifted into the runner / load-time seam alongside
the staged-compilation API end-state from egparedes' layered-
architecture proposal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…TS.md

Replace `:class:` / `:meth:` Sphinx cross-reference roles introduced in
the previous commit with plain backtick literals, per
CODING_GUIDELINES.md §3.8 (Google style + Sphinx-napoleon; reST roles
forbidden). Add an explicit Do entry in AGENTS.md so future agents do
not re-introduce them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sweep the docstrings this PR added or modified relative to upstream/main
and replace `:class:` / `:meth:` / `:func:` Sphinx roles with plain
backtick literals, per CODING_GUIDELINES.md §3.8.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread AGENTS.md

@egparedes egparedes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First quick round of reviews. In general looks good to me. In the second run I would need to pay more attention to the changes in the roundtrip backend and tests.

Comment thread src/gt4py/next/program_processors/runners/dace/workflow/compilation.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/workflow/compilation.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/workflow/compilation.py Outdated
Comment thread src/gt4py/next/program_processors/runners/dace/workflow/compilation.py Outdated
Comment thread src/gt4py/next/program_processors/runners/roundtrip.py
havogt and others added 5 commits June 25, 2026 13:31
Keep `sdfg_build_folder` as the str returned by `get_cache_folder` and
convert to `pathlib.Path` only where consumers require it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`load()` previously recomputed the .so path by calling two
dace-internal helpers (`get_folder_version`, `get_binary_name`) against
`build_folder` and `sdfg.name`. That coupled the load path to dace
internals — `get_folder_version` was removed in dace 2.0.0a3, silently
breaking artifact loading on that release.

Capture `sdfg_program.filename` after `sdfg.compile()` and store it on
the artifact. `_load_compiled_program` now goes straight to
`get_program_handle(self.library_path, sdfg)`; the two helper calls
disappear.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Without the in-process hand-off, the back-to-back
`sdfg.compile()` → `artifact.load()` sequence in thread mode hits
dace's "library already loaded, renaming file" path: dace renames the
.so on disk and dlopens the renamed copy, silently invalidating
`library_path` for any later load.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@egparedes egparedes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tehrengruber tehrengruber left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only did a birds eye review. If compilation and loading are cleanly split, it should eventually be possible to only load when compilation already occured in a previous run, which is almost the case in gtfn, but not in dace yet.

Comment thread src/gt4py/next/otf/stages.py
Comment thread src/gt4py/next/otf/compilation/compiler.py Outdated
Comment on lines 101 to 112
with locking.lock(src_dir):
data = build_data.read_data(src_dir)

if not data or not is_compiled(data) or self.force_recompile:
self.builder_factory(inp, self.cache_lifetime).build()

new_data = build_data.read_data(src_dir)

if not new_data or not is_compiled(new_data) or not module_exists(new_data, src_dir):
raise CompilationError(
f"On-the-fly compilation unsuccessful for '{inp.program_source.entry_point.name}'."
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the builder_factory this part also feels like loading, the only thing necessary to construct a CPPCompilationArtifact appears to be the src_dir.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am experimenting to isolate the load to just be an import of a python module. I think will propose as a separate PR after this one.

bind_func_name=self.bind_func_name,
device_type=self.device_type,
)
_live_program_cache[artifact.build_folder] = CompiledDaceProgram(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a little odd that DaCeCompilationArtifact and DaCeCompiler manage the _live_program_cache, but managing everything from DaCeCompilationArtifact also has its downsides. Ideally we could compile the sdfg without loading it then this part here wouldn't be needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in a previous iteration that stuff was on the compilation artifact...

havogt and others added 5 commits June 26, 2026 11:20
Allows ``isinstance(x, CompilationArtifact)`` in tests / asserts.
``runtime_checkable`` on a single-method Protocol only checks for the
presence of ``load``, not its signature.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Zero call sites anywhere in src/ or tests/. ``GTFNCompiler`` is built via
its own ``GTFNCompilerFactory`` and ``DaCeCompiler`` via
``DaCeCompilationStepFactory``; the base ``CompilerFactory`` was never
referenced.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
``sdfg.compile(return_program_handle=False)`` builds the .so without loading
it; the subsequent ``DaCeCompilationArtifact.load()`` is then a clean first
dlopen via ``get_program_handle``. The "library already loaded, renaming
file" warning the compile→load sequence used to trigger is gone, and the
artifact no longer needs to pre-populate ``_live_program_cache`` to dodge
that rename.

The cache stays, but for a narrower reason: hand off the live
``CompiledDaceProgram`` (carrying mutable ``csdfg_argv`` state) across
``Backend.compile()`` invocations for the same build folder, so ``fast_call``
doesn't re-run ``construct_arguments`` every time. Compile-time pre-population
removed; the cache is now load-time only.

``library_path`` is resolved inside ``dace_context`` (which sets
``build_folder_mode``), since ``get_binary_name`` falls back to the global
config when ``folder_mode`` is unset.
After ``return_program_handle=False``, the cache no longer serves the
rename-workaround purpose. Its only remaining role was to share
``CompiledDaceProgram.csdfg_argv`` state across ``Backend.compile()``
invocations for the same build folder — a scenario only triggered by
calling ``with_grid_type(...).with_backend(...)`` per program invocation,
which is an antipattern that creates a fresh ``CompiledProgramsPool``
every call.

The two fastcall tests were going through ``cases.verify`` → ``cases.run``
which rebinds on every call, masking the antipattern with the cache.
Rewrite them to bind once at the top of the test and call the bound
program directly. They now exercise fast_call reuse via the recommended
pattern (single bind, many calls) and would catch a real regression.
…finalize-split

# Conflicts:
#	tests/next_tests/unit_tests/program_processor_tests/runners_tests/dace_tests/test_dace.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants