refactor[next]: Split compilation into a actual compilation and loading of the compilation artifact by havogt · Pull Request #2587 · GridTools/gt4py

havogt · 2026-04-27T20:12:45Z

This is a preparation to allow compilation in separate processes for faster jit compilation.

The new workflow: CompilationStep produces a CompilationArtifact which is picklable and has a load() method which returns the original ExecutableProgram.

…finalize-split

Copilot

Pull request overview

Prepares the “next” compilation pipeline for out-of-process / multi-process JIT by splitting “compile” into producing a (mostly) picklable compilation artifact and a separate load() step that materializes the in-process executable callable.

Changes:

Introduce stages.CompilationArtifact protocol and update OTF compilation contracts to return artifacts with load().
Refactor CPP (nanobind/GTFN) and DaCe compilation steps to return artifact dataclasses instead of live callables; move “decoration” into artifact.load().
Update Backend.compile() and tests to materialize executables via artifact.load() and add pickle round-trip contract tests.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/next_tests/unit_tests/program_processor_tests/runners_tests/test_gtfn.py	Updates assertions to reflect `compilation.device_type` living on the compilation step.
tests/next_tests/unit_tests/program_processor_tests/runners_tests/dace_tests/test_dace_compilation.py	Adds pickle round-trip contract test for DaCe artifact (ensuring live handles don’t serialize).
tests/next_tests/unit_tests/otf_tests/test_compiled_program.py	Adapts a hijacking test to return a dummy artifact with `load()` instead of a raw callable.
tests/next_tests/unit_tests/otf_tests/compilation_tests/test_compiler.py	Adds pickle round-trip contract test for CPP artifact.
tests/next_tests/integration_tests/feature_tests/otf_tests/test_nanobind_build.py	Updates integration flow to call `.load()` on the returned compilation artifact.
src/gt4py/next/program_processors/runners/roundtrip.py	Wraps roundtrip’s in-memory callable in a `RoundtripArtifact` implementing `load()`.
src/gt4py/next/program_processors/runners/gtfn.py	Introduces `GTFNCompilationArtifact`/`GTFNCompiler` to stamp device type and apply argument conversion in `load()`.
src/gt4py/next/program_processors/runners/dace/workflow/factory.py	Removes separate decoration stage wiring (now handled by artifact `load()`).
src/gt4py/next/program_processors/runners/dace/workflow/decoration.py	Adjusts imports/types to avoid cycles after moving decoration into artifact `load()`.
src/gt4py/next/program_processors/runners/dace/workflow/compilation.py	Adds `DaCeCompilationArtifact` and makes DaCe compilation return an on-disk artifact with `load()`.
src/gt4py/next/otf/stages.py	Adds `CompilationArtifact` protocol.
src/gt4py/next/otf/recipes.py	Updates `OTFCompileWorkflow` typing to end at `CompilationArtifact` (removes decoration step).
src/gt4py/next/otf/definitions.py	Updates `CompilationStep` protocol to return `CompilationArtifact`.
src/gt4py/next/otf/compilation/compiler.py	Refactors CPP compilation into `CPPCompiler` + `CPPCompilationArtifact(load)`.
src/gt4py/next/backend.py	Changes backend executor to return artifacts; `compile()` now calls `artifact.load()`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

This PR refactors the “next” OTF compilation workflow so that compilation steps produce a picklable CompilationArtifact with a load() method, enabling future compilation in separate processes while keeping load/materialization process-local.

Changes:

Introduces stages.CompilationArtifact and updates CompilationStep, OTFCompileWorkflow, and Backend.compile() to compile → return artifact → load() into an ExecutableProgram.
Adds/updates concrete artifacts and compilers (CPP/GTFN/DaCe/Roundtrip) so backend-specific loading/wrapping happens in artifact.load() rather than a separate decoration stage.
Updates tests and adds minimal “pickle round-trip” contract tests for the new artifact types.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/next_tests/unit_tests/program_processor_tests/runners_tests/test_gtfn.py	Updates assertions to reflect device type living on the compilation step/artifact.
tests/next_tests/unit_tests/program_processor_tests/runners_tests/dace_tests/test_dace_compilation.py	Adds pickle round-trip contract test for `DaCeCompilationArtifact` (ensures live handle is dropped).
tests/next_tests/unit_tests/otf_tests/test_compiled_program.py	Adjusts test to use a dummy artifact (`load()` returns a no-op callable).
tests/next_tests/unit_tests/otf_tests/compilation_tests/test_compiler.py	Adds pickle round-trip contract test for `CPPCompilationArtifact`.
tests/next_tests/integration_tests/feature_tests/otf_tests/test_nanobind_build.py	Updates integration tests to build → artifact → `.load()` and passes `device_type`.
src/gt4py/next/program_processors/runners/roundtrip.py	Splits roundtrip into source generation + module load, introduces `RoundtripArtifact.load()`.
src/gt4py/next/program_processors/runners/gtfn.py	Introduces `GTFNCompilationArtifact`/`GTFNCompiler` so wrapping happens on `load()`.
src/gt4py/next/program_processors/runners/dace/workflow/factory.py	Removes decoration stage wiring; compilation now returns an artifact.
src/gt4py/next/program_processors/runners/dace/workflow/decoration.py	Adjusts imports/types to avoid cycles after moving decoration to artifact load.
src/gt4py/next/program_processors/runners/dace/workflow/compilation.py	Introduces `DaCeCompilationArtifact` and updates compiler to return it (with optional process-local live-program cache).
src/gt4py/next/otf/stages.py	Adds `CompilationArtifact` protocol.
src/gt4py/next/otf/recipes.py	Updates `OTFCompileWorkflow` typing to end at `CompilationArtifact` (removes decoration stage).
src/gt4py/next/otf/definitions.py	Updates `CompilationStep` protocol to return `CompilationArtifact`.
src/gt4py/next/otf/compilation/compiler.py	Refactors CPP compiler to return `CPPCompilationArtifact` and adds `load()`.
src/gt4py/next/backend.py	Updates `Backend.compile()` to call `artifact.load()` before returning an executable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Brings PR 2587 (compilation/load artifact split) up to date with the GridTools#2648 fingerprinting refactor. Conflict resolutions: - dace/workflow/compilation.py: keep both — upstream's GPU TX-marker pass and this branch's str(build_folder) (build_folder is now a pathlib.Path here). - dace/workflow/factory.py: keep the `cache` import (used by the cached_translation trait); drop the now-unused `decoration` step/import (decoration moved into the artifact's load()). - roundtrip.py: keep RoundtripArtifact return; upstream's inline decorated_fencil is the pre-split pattern this PR replaces. - test_dace_compilation.py: combine both test sets (TX-marker + artifact pickle). - next/utils.py: restore MetadataBasedPickling (silently dropped by the textual merge — GridTools#2648 removed it as dead code, but this PR's artifacts depend on it). Orthogonal to GridTools#2648's fingerprint opt-out; both share the gt4py_metadata namespace.

Move the in-process CompiledDaceProgram cache from a `pickle=False` field on `DaCeCompilationArtifact` to a module-level dict keyed by build folder. The artifact becomes a plain frozen dataclass with no live process-bound state, restoring the strong `CompilationArtifact` Protocol contract; receivers in another process see an empty cache and fall through to the disk-based `get_program_handle` load path as before. `_live_program` was the only `pickle=False` field in the tree, so the `MetadataBasedPickling` mixin + `_get_metadata_based_state_getstate` helper become dead code and are deleted (re-doing what GridTools#2648 had removed; the merge re-introduced them only to serve this one user). `CPPCompilationArtifact` drops the now-no-op `MetadataBasedPickling` parent. `gt4py_metadata` itself stays, still used for the orthogonal `fingerprint=False` axis introduced in GridTools#2648. The Protocol docstring is restored to the strong form and explicitly flags `RoundtripArtifact.dispatch_backend` as the one remaining deviation, to be lifted into the runner / load-time seam alongside the staged-compilation API end-state from egparedes' layered- architecture proposal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…TS.md Replace `:class:` / `:meth:` Sphinx cross-reference roles introduced in the previous commit with plain backtick literals, per CODING_GUIDELINES.md §3.8 (Google style + Sphinx-napoleon; reST roles forbidden). Add an explicit Do entry in AGENTS.md so future agents do not re-introduce them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Sweep the docstrings this PR added or modified relative to upstream/main and replace `:class:` / `:meth:` / `:func:` Sphinx roles with plain backtick literals, per CODING_GUIDELINES.md §3.8. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

egparedes

First quick round of reviews. In general looks good to me. In the second run I would need to pay more attention to the changes in the roundtrip backend and tests.

Keep `sdfg_build_folder` as the str returned by `get_cache_folder` and convert to `pathlib.Path` only where consumers require it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

`load()` previously recomputed the .so path by calling two dace-internal helpers (`get_folder_version`, `get_binary_name`) against `build_folder` and `sdfg.name`. That coupled the load path to dace internals — `get_folder_version` was removed in dace 2.0.0a3, silently breaking artifact loading on that release. Capture `sdfg_program.filename` after `sdfg.compile()` and store it on the artifact. `_load_compiled_program` now goes straight to `get_program_handle(self.library_path, sdfg)`; the two helper calls disappear. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Without the in-process hand-off, the back-to-back `sdfg.compile()` → `artifact.load()` sequence in thread mode hits dace's "library already loaded, renaming file" path: dace renames the .so on disk and dlopens the renamed copy, silently invalidating `library_path` for any later load. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…finalize-split

egparedes

LGTM

tehrengruber

Only did a birds eye review. If compilation and loading are cleanly split, it should eventually be possible to only load when compilation already occured in a previous run, which is almost the case in gtfn, but not in dace yet.

tehrengruber · 2026-06-25T16:37:26Z

        with locking.lock(src_dir):
            data = build_data.read_data(src_dir)

            if not data or not is_compiled(data) or self.force_recompile:
                self.builder_factory(inp, self.cache_lifetime).build()

            new_data = build_data.read_data(src_dir)

            if not new_data or not is_compiled(new_data) or not module_exists(new_data, src_dir):
                raise CompilationError(
                    f"On-the-fly compilation unsuccessful for '{inp.program_source.entry_point.name}'."
                )


Except for the builder_factory this part also feels like loading, the only thing necessary to construct a CPPCompilationArtifact appears to be the src_dir.

I am experimenting to isolate the load to just be an import of a python module. I think will propose as a separate PR after this one.

tehrengruber · 2026-06-25T16:45:25Z

+            bind_func_name=self.bind_func_name,
+            device_type=self.device_type,
+        )
+        _live_program_cache[artifact.build_folder] = CompiledDaceProgram(


Feels a little odd that DaCeCompilationArtifact and DaCeCompiler manage the _live_program_cache, but managing everything from DaCeCompilationArtifact also has its downsides. Ideally we could compile the sdfg without loading it then this part here wouldn't be needed.

Yes, in a previous iteration that stuff was on the compilation artifact...

Allows ``isinstance(x, CompilationArtifact)`` in tests / asserts. ``runtime_checkable`` on a single-method Protocol only checks for the presence of ``load``, not its signature. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Zero call sites anywhere in src/ or tests/. ``GTFNCompiler`` is built via its own ``GTFNCompilerFactory`` and ``DaCeCompiler`` via ``DaCeCompilationStepFactory``; the base ``CompilerFactory`` was never referenced. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

``sdfg.compile(return_program_handle=False)`` builds the .so without loading it; the subsequent ``DaCeCompilationArtifact.load()`` is then a clean first dlopen via ``get_program_handle``. The "library already loaded, renaming file" warning the compile→load sequence used to trigger is gone, and the artifact no longer needs to pre-populate ``_live_program_cache`` to dodge that rename. The cache stays, but for a narrower reason: hand off the live ``CompiledDaceProgram`` (carrying mutable ``csdfg_argv`` state) across ``Backend.compile()`` invocations for the same build folder, so ``fast_call`` doesn't re-run ``construct_arguments`` every time. Compile-time pre-population removed; the cache is now load-time only. ``library_path`` is resolved inside ``dace_context`` (which sets ``build_folder_mode``), since ``get_binary_name`` falls back to the global config when ``folder_mode`` is unset.

After ``return_program_handle=False``, the cache no longer serves the rename-workaround purpose. Its only remaining role was to share ``CompiledDaceProgram.csdfg_argv`` state across ``Backend.compile()`` invocations for the same build folder — a scenario only triggered by calling ``with_grid_type(...).with_backend(...)`` per program invocation, which is an antipattern that creates a fresh ``CompiledProgramsPool`` every call. The two fastcall tests were going through ``cases.verify`` → ``cases.run`` which rebinds on every call, masking the antipattern with the cache. Rewrite them to bind once at the top of the test and call the bound program directly. They now exercise fast_call reuse via the recommended pattern (single bind, many calls) and would catch a real regression.

…finalize-split # Conflicts: # tests/next_tests/unit_tests/program_processor_tests/runners_tests/dace_tests/test_dace.py

havogt added 22 commits April 27, 2026 12:59

refactor[next]: otf split build and load

6a35549

refactorings to combine load+decoration

d251057

BuildArtifact can materialize() itself

b8efa44

remove OTFCompileWorkflow

915ce27

cleanup docstrings

c51ac55

refactor: remove lazy import

9b842e2

apply pre-commit

c022bde

update roundtrip

a107dd6

Merge remote-tracking branch 'upstream/main' into worktree-otf-build-…

fa18067

…finalize-split

separate gtfn from generic compiler

efecdf5

don't serialize/deserialize dace in the same process

68ead69

don't search for sdfg in materialize

4702660

restore generic compiler

77b6235

cleanup

3869d67

more cleanup

9017b33

add tests

5bffec0

cleanup

7f219d7

avoid decoration needs picklable

69c5ae8

cleanup

e05a336

cleanup

7e9fa08

Build->Compile, materialize->load

8f35cb7

cleanup

29508bf

havogt requested a review from Copilot April 27, 2026 20:14

Copilot AI reviewed Apr 27, 2026

View reviewed changes

havogt and others added 3 commits April 28, 2026 08:23

use tmp_path fixture

9f4e776

refactor roundtrip to resepect picklability

9c9234d

Merge branch 'main' into worktree-otf-build-finalize-split

528dcd5

havogt requested a review from Copilot April 28, 2026 07:33

Copilot started reviewing on behalf of havogt April 28, 2026 07:34 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Comment thread src/gt4py/next/otf/stages.py Outdated

Comment thread src/gt4py/next/program_processors/runners/dace/workflow/compilation.py Outdated

Comment thread src/gt4py/next/program_processors/runners/roundtrip.py

sdfg as part of artifact (because of upcoming change)

edd9e93

msimberg mentioned this pull request May 4, 2026

Add persistent gt4py cache for CSCS CI C2SM/icon4py#1218

Merged

havogt and others added 4 commits June 19, 2026 12:15

havogt commented Jun 19, 2026

View reviewed changes

Comment thread AGENTS.md

egparedes reviewed Jun 24, 2026

View reviewed changes

havogt commented Jun 25, 2026

View reviewed changes

Comment thread tests/next_tests/unit_tests/otf_tests/compilation_tests/test_compiler.py

havogt and others added 5 commits June 25, 2026 13:31

refactor[next]: drop Path/str round-trip in DaCeCompiler

c125a2a

Keep `sdfg_build_folder` as the str returned by `get_cache_folder` and convert to `pathlib.Path` only where consumers require it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

docs[next]: TODO to drop _live_program_cache if dace stops renaming

c9dc7be

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Merge remote-tracking branch 'upstream/main' into worktree-otf-build-…

13ca4f8

…finalize-split

egparedes approved these changes Jun 25, 2026

View reviewed changes

havogt mentioned this pull request Jun 25, 2026

feat[next]: process-pool compilation behind GT4PY_BUILD_JOBS_MODE=process havogt/gt4py#67

Open

4 tasks

tehrengruber approved these changes Jun 26, 2026

View reviewed changes

havogt and others added 5 commits June 26, 2026 11:20

Merge remote-tracking branch 'upstream/main' into worktree-otf-build-…

0451bff

…finalize-split # Conflicts: # tests/next_tests/unit_tests/program_processor_tests/runners_tests/dace_tests/test_dace.py

havogt mentioned this pull request Jun 26, 2026

prototype[next]: collapse CompilationArtifact to one dataclass + per-build loader file havogt/gt4py#68

Open

Uh oh!

Conversation

havogt commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

egparedes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

egparedes left a comment

Choose a reason for hiding this comment

Uh oh!

tehrengruber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tehrengruber Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

havogt Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

tehrengruber Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

havogt Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants