Explicit copy memset nodes by ThrudPrimrose · Pull Request #2380 · spcl/dace

ThrudPrimrose · 2026-05-22T11:17:08Z

No description provided.

Introduce explicit copy / memset library nodes and a pass that lifts every implicit AccessNode -> AccessNode (or scope-staging) edge into a CopyLibraryNode, so the dataflow shape post-simplification is self-describing and the legacy "copy edge gets lowered by ad-hoc codegen" path can be deprecated. dace/libraries/standard/nodes/copy_node.py (CopyLibraryNode + 8 expansions) dace/libraries/standard/nodes/memset_node.py (MemsetLibraryNode + 3 expansions) dace/libraries/standard/helper.py (shared expansion helpers) dace/libraries/standard/environments/cpu.py (CPU environment used by ExpandMemcpyCPU) dace/sdfg/construction_utils.py (small utilities used by copy_node) dace/transformation/passes/insert_explicit_copies.py (the pass) Pass wiring: appended `InsertExplicitCopies` to `SIMPLIFY_PASSES` so ``SDFG.simplify()`` lifts implicit copies as part of standard cleanup (idempotent — once lifted, no further implicit edges remain to match). Tests: tests/library/copy_node_test.py tests/library/memset_node_test.py tests/passes/insert_explicit_copies_test.py

…ryNode Three small fixes that unblock the Copy/Memset libnodes' schedule inference: * ``infer_out_connector_type`` / ``infer_connector_types``: wrap the ``e.data.subset and e.data.subset.num_elements() == 1`` expression in ``bool()``. A single-element ``Range`` returns False from ``__bool__`` and the bare ``and`` chain leaks the Range object instead of a bool, which later trips the ``scalar |= …`` operator with TypeError when the libnode's collapsed subset is empty. * ``_determine_schedule_from_storage``: when the node under inspection is a ``CopyLibraryNode`` / ``MemsetLibraryNode`` AND any neighbouring memlet imposes a ``GPU_Device`` constraint, return ``GPU_Device`` directly. Without this, an H2D copy has both ``CPU_Multicore`` (from the CPU source) and ``GPU_Device`` (from the GPU sink) in its constraint set and the existing ``len(constraints) > 1`` branch raises ``InvalidSDFGNodeError: Cannot determine default schedule for node copy_A_to_gpu_A``. The libnode is exactly the node class designed to bridge storages; routing it to GPU_Device when GPU is involved is the intended resolution.

…simplify InsertExplicitCopies materialises implicit AccessNode->AccessNode edges into CopyLibraryNodes — a shape-changing lowering step, not a shape-preserving simplification. Including it in SIMPLIFY_PASSES broke 22 tests (numpy reshape/flatten/view, redundant_copy count assertions, range_indirection / reinterpret validators) that rely on simplify preserving the implicit-edge form. Pass remains available as a standalone Pass for consumers that want it (e.g. the GPU codegen lowering pipeline, which calls it explicitly via InsertExplicitGPUGlobalMemoryCopies).

Three xfail-strict pins: - AccessNode<->View edges must not be lifted (policy: views are aliases). - Rank-changing reshape lift produces mismatched-rank memlets that trip codegen IndexError in cpp_offset_expr. - Dtype-reinterpret View lift produces CopyLibraryNode with mismatched element types, failing sdfg.validate().

…atch InsertExplicitCopies changes: - Drop the AN<->View edge lift (was inserting a Copy + intermediate buffer for every AN<->View edge, including pure aliasing edges that don't need one). - Add a round-trip collapse: AN_src -> View -> AN_dst becomes a single AN_src -> CopyLibraryNode -> AN_dst direct edge with the composed memlet (src side from the view-underlying subset, dst side from the access-side subset). The View AccessNode is removed when it has no other consumers. CopyLibraryNode expansion: - select_copy_implementation now routes rank-mismatched volume-equal copies (different rank after collapse_shape_and_strides) to CopyNDTemplate when both sides are same-storage C-packed contiguous. MappedTasklet reuses a single access expression for both endpoints, which produces a rank-mismatched memlet on the smaller side and crashes codegen. - ExpandCopyNDTemplate flattens to a 1D pointer walk when the in/out collapsed shapes have different ranks (both sides packed contiguous, so the linearization is sound). Tests: - Update the 4 view-lift tests to assert the new policy (round-trip collapses to a single Copy; View is removed if it has no other consumers; repeat applies are no-ops). - Add 10 new test_iec_* pins covering both patterns: AN<->View edges kept direct, AN->View->AN round-trip collapse, AN->View round-trip with another consumer keeping the View, AN<->AN copies with rank differing because of a constant-index dim, AN<->AN rank-mismatched volume-equal copies routed through CopyND, plus @dace.program reshape and reinterpret cases.

…h legacy and experimental codegen)

…TREAM_NAME

… explicit-copy-memset-nodes

…cstring

…onstruction_utils Use SDFG.parent (O(1)) instead of the recursive _get_parent_state scan to find the state containing a nested-SDFG node; verified equivalent. copy_node.py imports the helper from dace.transformation.helpers (top-level, no import cycle). dace/sdfg/construction_utils.py removed (it only held these two helpers on this branch).

…T_STORAGES

…copy_node

…sinstance checks

…appedTasklet

…nearization; selector bypasses CopyND

…ted reshape impl

…se feature-regression tests

… helpers

…trides in memset, Range.num_elements over reduce

…D only for shared memory)

…y2D 1D+2D branches

…rings Same-rank copy with mismatched per-dim shapes (e.g. (3,4) -> (4,3)) previously slipped through libnode validation and tripped DaCe's generic ``out-of-bounds memlet`` post-expansion validate -- the rejection was incidental. Add a per-dim shape check at the head of ExpandMappedTasklet's same-rank branch with a specific error message; the test now asserts on that explicit contract instead of the incidental SDFG-level error. Trim multi-line test docstrings (transpose-pattern, rank-mismatch variants, no-common-stride1, shared-memory collective, single- element-in-kernel) to one line each.

… array shapes The transpose-pattern check compares ``in_shape_collapsed`` vs ``out_shape_collapsed``, which come from ``collapse_shape_and_strides(subset, strides)`` -- per-dim subset sizes after singleton-collapse, NOT the underlying array shapes. Add two pin tests: test_copy_same_subset_different_array_shapes -- 0:N slice between arrays of different total size is fine when the per-dim subset sizes match. test_copy_1d_slice_from_2d_source -- a row-slice ``[i, 0:N]`` of a 2D array copies into a 1D array; the leading singleton collapses to the same rank on both sides. Both pass; transpose-pattern still rejected.

….X form Consistency over brevity: every test now uses the full dace.dtypes.StorageType.{CPU_Heap,GPU_Global,GPU_Shared,Register} inline instead of a mix of local cpu/gpu aliases and inline long forms. Drops 26 local-alias lines and removes the inconsistency across the file. Two over-long docstrings trimmed to keep yapf happy.

Previously select_memset_implementation picked CUDA / CPU purely from storage type. cudaMemsetAsync / memset zero ``num_elements * sizeof(T)`` consecutive bytes from the dst pointer, so a non-contiguous subset (e.g. a middle 2-D slice of a row-major array) silently zeroes memory outside the region. Auto now routes such cases to the 'pure' (mapped-tasklet) expansion, which writes per element via the subset. ExpandCUDA / ExpandCPU also reject non-contiguous subsets upfront with a clear error so explicit forcing still raises. Tests: test_memset_auto_routes_non_contiguous_to_pure_cpu -- Auto succeeds via 'pure', zeroes only the 6x10 sub-block. test_memset_cpu_rejects_non_contiguous_subset -- explicit CPU raises. test_memset_cuda_rejects_non_contiguous_subset -- explicit CUDA raises. Also: in copy_node_test.py, factor out _make_copy_skeleton + add _make_legacy_copy_sdfg (canonical Memlet(data=dst, subset=dst_subset, other_subset=src_subset) form). The previous _strip_libnodes mutation was producing memlets without other_subset, which made the legacy codegen look broken on patterns it actually handles correctly. Keep the genuine libnode-advantage pin -- rank-mismatch 4D->2D Fortran reshape -- delete the three false-positive tests.

- helper.py: drop collapsed_map_lengths, inlined as [s for s in subset.size() if s != 1] (2 callsites; Range.size() is the existing util that does the same job). - helper.py: shorten CURRENT_STREAM_NAME comment. - insert_explicit_copies.py: drop src_locations / dst_locations / skip_inside_device_scope properties + _storage_allowed helper. Zero callers in dace/ or tests/ pass any of them. Drop the Iterable / is_devicelevel_gpu imports that go with them. - insert_explicit_copies.py: fix the class docstring -- the pass only handles AN -> AN (and AN -> View -> AN via the round-trip collapse). Map-staging patterns are not handled on this branch (they were on explicit-gpu-global-copies; D1 outermost-subset bug; removed pending a correct rewrite).

…side the map scope AN -> MapEntry -> AN (stage-in) and AN -> MapExit -> AN (stage-out) edges now lift to a CopyLibraryNode placed INSIDE the map scope, wired directly to MapEntry's output connector / MapExit's input connector (no intermediate AN inserted on the scope-side). Chains of MapEntries / MapExits are followed via memlet_path; the body of the map (tasklets, NestedSDFGs, nested maps) is irrelevant to the lift. Views on the outer side stay in place. The outer-side memlet (per-iteration subset on the outer array) is preserved verbatim on the new MapEntry -> libnode (or libnode -> MapExit) edge; the inner-array-side memlet is derived via the existing _derive_matching_dst_subset against the inner array's descriptor. Tests in tests/passes/insert_explicit_copies_test.py: test_lift_stage_in_copy test_lift_stage_out_copy test_lift_stage_in_copy_through_view test_lift_stage_out_copy_through_view test_lift_stage_in_copy_chained_map_entries test_lift_stage_out_copy_chained_map_exits test_lift_stage_in_copy_with_nested_sdfg_consumer Each asserts: exactly one libnode in the lifted state, libnode's scope owner is the (innermost) MapEntry, libnode input/output wired directly to MapEntry/MapExit (not via an inserted AN), numerical match against NumPy, and zero `CopyND<` template instantiations in generated code. New helper _assert_no_copynd(sdfg) calls generate_code and greps each CodeObject for `CopyND<`; pinned in all seven new staging tests. Endpoint resolution refactored to sdutils.find_input_arraynode / find_output_arraynode instead of inline memlet_path[0/-1] + isinstance checks.

- Drop the unused third argument from _derive_matching_dst_subset (docstring already said "unused; kept for symmetric signature"); update both direct-copy callsites accordingly. - Fold _is_stage_in_candidate / _is_stage_out_candidate / _insert_stage_in_libnode / _insert_stage_out_libnode into one _lift_staging_edge(..., stage_in: bool) -- the four methods differed only by which side of the edge was the inner AN and which was the MapEntry/MapExit. -30 LoC net.

Introduce _compile_no_copynd(sdfg) wrapper: greps every CodeObject emitted by sdfg.generate_code() for 'CopyND<' and asserts none, then returns sdfg.compile(). Apply to all 17 compile sites in copy_node_test.py (including the libnode-side compile of the legacy comparison test). Pins the contract: libnode expansions displace the runtime CopyND fallback entirely. The only intentional CopyND user is ExpandSharedMemoryCollective, whose test inspects tasklet bodies directly without compiling, so no exemption is needed.

…cks; factor staging-test scaffold dace/transformation/passes/insert_explicit_copies.py - Drop dead ``if src_subset is None or dst_subset is None`` guard (_resolve_subset_for always returns a Range). - Inline _expr_lt (4 lines, one caller in _is_consecutive_reshape); the try/except + intent comment now live where they're used. dace/libraries/standard/nodes/copy_node.py - Inline _coarse_pick_for_storage_pair (single caller in select_copy_implementation). - Inline _cuda2d_strides_are_supported (single caller in _refine_cuda_impl_for_subsets); the explanatory comment moves with the logic. tests/passes/insert_explicit_copies_test.py - Factor the structural assertion shared across the seven staging tests into _assert_lifted_libnode(state, side, expected_scope=...) -- replaces ~5 lines of "find libnode + check scope + check wire" per test with one call. - Pull "list View AccessNodes" into _view_an_names helper. tests/library/{copy_node,memset_node}_test.py + tests/passes/insert_explicit_copies_test.py - Replace explicit ``if __name__ == "__main__": test_a(); test_b(); ...`` blocks with ``pytest.main([__file__])`` so the script form picks up tests automatically without a hand-maintained list.

…shape Three lines collapsed to one. The comment still explains why the try/except swallows the exception silently (symbolic indeterminacy + equal-product safety net) without the docstring-shaped prose that was inherited from the deleted _expr_lt helper.

…ion pass Functions that already receive parent_state no longer take a redundant parent_sdfg/sdfg argument; they read the owning SDFG from state.sdfg. This covers the copy/memset expansion helpers, the auto_dispatch shim, CopyLibraryNode.src_storage/dst_storage, and the InsertExplicitCopies private methods. The framework-fixed validate(self, sdfg, state) and the ExpandTransformation.expansion(node, parent_state, parent_sdfg) signatures are left intact. Add copy_node tests for a padded (1, N) array whose unit leading dim carries a non-packed stride: is_contiguous_subset is False, so Auto falls back to a map (same storage) or a pitched cudaMemcpy2D (cross storage), and the copy stays numerically exact.

ExpandMemcpyCPU and the CUDA1D helper duplicated the same validate -> contiguous-check -> size -> pointer-Tasklet body, differing only in the cross-storage flag and the memcpy vs cudaMemcpyAsync code string. Fold them into one _make_memcpy_tasklet(node, parent_state, *, cuda), mirroring MemsetLibraryNode's _make_memset_tasklet. Generated code is unchanged.

Imports: drop the copy/subsets/data aliases and inline imports for plain top-level imports; add type hints to the module helpers. Reduce _derive_matching_dst_subset to a single volume check via subsets.Range.num_elements + dace.symbolic.equal, removing the four-branch shape ladder, the hand-rolled symbolic equality, and _is_consecutive_reshape (whose two-pointer walk only ever compared total volumes). Stop collapsing AN -> View -> AN round-trips. A View is an Array subclass with its own shape/strides, so _replace_direct_copies now lifts any View<->Array movement edge and skips only the view's alias edge, leaving the view in place as a copy endpoint (AN -> View -> Copy -> AN). Deletes _collapse_round_trip_views and the now-unused _resolve_subset_for; updates the view tests with structural + numerical checks for both view directions.

…e, test dedup - _replace_direct_copies now resolves src/dst subsets via Memlet.get_src_subset / get_dst_subset (the memlet path) instead of an ambiguous data-name else branch; the self-copy convention stays explicit. - ExpandSharedMemoryCollective's GPU_ThreadBlock guard uses the existing dace.sdfg.scope.is_in_scope, so get_parent_map_and_loop_scopes is dropped from transformation/helpers.py (the PR no longer modifies that file). - Tests route SDFG construction through shared builders: view round-trips (_make_view_round_trip_sdfg, src + dst), chained-map staging (_build_chained_stage_sdfg), reshape (_run_reshape_copy_test), and the array_to_array cases (one parametrized test) via _build_copy_sdfg. - Polybench tests reuse the canonical tests/polybench kernels instead of inline copies (covariance/correlation imported, fdtd-2d loaded by path under a clean module name); the __main__-only 'import polybench' in those three files moves under __main__ so they import without the absl CLI dependency. - Shortened/cleaned comments and docstrings.

Three related dispatcher fixes in ``select_copy_implementation`` plus a relaxation in ``ExpandTasklet`` so each case has a working expansion. 1. Shared <-> Register routes to ``Tasklet`` (single-element) or ``MappedTasklet`` (multi-element), never ``SharedMemoryCollective`` (whose expansion explicitly rejects a Register endpoint and was crashing ``samples/optimization/matmul.py --version optimize_gpu``). 2. Shared <-> {Shared, Global} placed *inside* a ``GPU_ThreadBlock`` map routes per-thread, same as Shared <-> Register. The collective expansion is itself the block-level op and rejects this placement. 3. Invariant: no single-element copy is ever routed to ``MappedTasklet`` (a 0-D map crashes in memlet propagation). Steps 1 and 2 of the dispatcher handle the single-element case explicitly; Step 3+ only see multi-element inputs. ``ExpandTasklet`` is relaxed to accept any single-element copy regardless of storage pair -- the volume check at the top of the expansion already rejects multi-element, and the dispatcher now intentionally routes thread-level Shared single-element copies here. Tests in ``tests/library/copy_node_test.py``: - 7 exact-impl unit tests, one per unique routing rule - 1 parameterised invariant test enumerating every (src, dst) storage pair at single-element volume, asserting != ``MappedTasklet`` - 1 ``__syncthreads()`` emission check at single-element volume - 3 correctness roundtrip kernels: variant A (cooperative load OUTSIDE ``tblock_map``), variant B (per-thread load INSIDE ``tblock_map``), and a full Global -> Shared -> Register -> Register -> Shared -> Global pipeline Regressions verified across both ``DACE_compiler_cuda_implementation`` values (legacy + experimental): 82/82 ``copy_node_test`` pass; full sweep of ``tests/library``, ``tests/transformations``, ``tests/codegen``, ``tests/sdfg``, ``tests/npbench`` shows 0 code regressions.

…esolved subset The previous lift cloned the inner edge's Memlet onto the new outer-side edge (``MapEntry -> libnode`` for stage-in, ``libnode -> MapExit`` for stage-out). When the inner edge's Memlet was named dst-relative (``data == inner_node.data`` with the outer-side subset in ``other_subset``), the cloned Memlet carried the wrong ``data`` field and SDFG validation rejected the lifted edge with ``Memlet data does not match source or destination data nodes``. Construct the outer-side Memlet explicitly: query ``get_src_subset`` / ``get_dst_subset`` to resolve the subset in the outer array's index space, then build ``Memlet(data=outer.data, subset=...)`` -- propagating ``dynamic`` and ``wcr`` from the original. The inner-side subset is derived from the resolved outer subset rather than from ``outer_memlet.subset`` (which may be inner-relative). Test: ``tests/codegen/argument_signature_test.py::test_argument_signature_compiles_and_runs`` constructs exactly this dst-relative pattern (``A -> MapEntry/OUT_A -> tmp_in`` with ``Memlet(data='tmp_in', subset=0, other_subset=__i1)``) and now passes under both legacy and experimental codegens. ``tests/passes/insert_explicit_copies_test.py``: 51/51 (experimental), 53/53 (legacy) -- no regressions.

The outgoing Memlet at a scope exit can be source-relative -- naming the inner transient rather than the external array being written -- so using its .data dropped the real destination array (and its stride symbols) from a GPU kernel's argument list, yielding 'identifier undefined' at compile. Resolve the written array from the memlet tree's root (the outermost-scope node, the actual fan-out destination) instead, matching the long-standing NOTE here.

The previous form bundled the ``arglist`` API check with an optional ``sdfg.compile()`` gated on a cupy import. Split into: - ``test_argument_signature_test``: CPU-only ``arglist`` assertion that the indirect ``A`` / ``D`` references are reported as kernel arguments. - ``test_argument_signature_compiles_and_runs``: ``@pytest.mark.gpu`` end-to-end build + run that exercises the kernel-signature path the ``arglist`` fix at ``075052309`` repaired (the bug surfaced as ``identifier "D" is undefined`` at ``nvcc``). The shared SDFG-construction code is factored into ``_make_indirect_reference_sdfg`` so both tests use the same fixture.

No functional change. Trim the dst-relative-Memlet note in ``_lift_staging_edge`` to single-paragraph form, matching the comment density in ``copy_node.py``.

That fixed some obscure cases: - It was not possible to change the backend, which only affect systems that have an Nvidia _and_ AMD GPU. - `lru_cache` is not needed `cache` is fully adequate.

…p. Now they use the configured backend.

In case only a single element is copied the copy is always continuous. --------- Co-authored-by: Philip Muller <phimuell@santis>

… connector Cherry-pick of 30f5e0b minus the experimental_cuda KernelSpec piece (file doesn't exist on this branch). 1) ``_refine_cuda_impl_for_subsets`` / ``_make_mapped_tasklet_expansion``: replace sympy ``==`` shape comparisons with ``dace.symbolic.inequal_symbols``, which first runs ``equalize_symbols`` to merge same-named SymPy symbols carrying different assumption sets (e.g. ``Symbol('N', integer=True)`` vs ``Symbol('N', integer=True, positive=True)``). Without this the per-dim shape check rejected identical shapes as 'permutations', the 2D cuMemcpy2D selector misrouted, and the strided-stride / shape check spuriously failed. 2) ``_make_expansion_sdfg``: when the libnode carries the ``__dace_current_stream`` in-connector (gpuStream_t), register a matching scalar in the wrapper SDFG so the resulting NestedSDFG passes validation. Legacy codegen never adds the connector, so the branch is a no-op there.

Cherry-pick of the ``tests/library/copy_node_test.py`` portion of 714dec0 (the PromoteGPUScalarsToArrays parts of that commit don't apply -- file doesn't exist on this branch). ``test_copy_cuda_1d_single_element`` passed raw strings as strides (``strides=["src_stride"]``) which ``Array.validate`` rejects ("Strides must be ... integer values or symbols"). The scaffold now sympifies stride entries -- which causes ``add_datadesc`` to auto-register the stride symbol -- so the test's own ``sdfg.add_symbol("src_stride", dace.int32)`` collided and is removed.

ThrudPrimrose and others added 30 commits May 13, 2026 16:43

cleanup: dead code, docstrings, comments

f82811b

docs: ASCII-only, repo backtick convention, concise docstrings

92f031c

docs: ASCII-only, repo backtick convention, concise docstrings

9cf409e

Unify libnode stream connector on __dace_current_stream (valid in bot…

26547d6

…h legacy and experimental codegen)

Remove libnode stream-input plumbing; rename STREAM_CONN -> CURRENT_S…

3812133

…TREAM_NAME

Reuse subset.num_elements(); unify memset tasklet builder

dae3798

Merge remote-tracking branch 'origin/explicit-copy-memset-nodes' into…

6f26f73

… explicit-copy-memset-nodes

Drop stale stream-descriptor mention from CopyLibraryNode.validate do…

18cc384

…cstring

Move no-cycle inline imports to module top in copy/memset libnodes

a5540a2

Add length-1<->scalar conversion passes

a40c343

Passes are the only API (drop standalone function)

84a0079

Merge branch 'main' into explicit-copy-memset-nodes

dad18c1

Move connector names to class constants; use dtypes.{CPU,GPU}_RESIDEN…

1cbc095

…T_STORAGES

Apply connector-contract + inner-literal consolidation to memset and …

8e39e71

…copy_node

helpers: use imported nodes module for MapEntry/Tasklet/LibraryNode i…

a9b8bea

…sinstance checks

Accept Fortran-packed layouts in CopyNDTemplate; route mixed C/F to M…

ef777dd

…appedTasklet

MappedTasklet handles rank-mismatch via 1-D walker + int_floor/% deli…

192d6c9

…nearization; selector bypasses CopyND

Revert MappedTasklet rank-mismatch; CopyNDTemplate is the only suppor…

3074fd0

…ted reshape impl

Refactor multi-dim copy tests onto shared helper + add unsupported-ca…

6a42deb

…se feature-regression tests

Extract auto_dispatch + merge memset test helpers + type hints on new…

31b963a

… helpers

DRY/YAGNI: drop dead CopyExpansion fields, reuse collapse_shape_and_s…

7b3e859

…trides in memset, Range.num_elements over reduce

Drop ExpandCopyNDTemplate; MappedTasklet handles rank-mismatch (CopyN…

6c35792

…D only for shared memory)

Simplify _coarse_pick, _cuda2d_strides_are_supported, merge cudaMemcp…

a8aceb7

…y2D 1D+2D branches

Tailor _build_copynd_call to its sole shared-memory caller

f42e134

ThrudPrimrose and others added 30 commits May 20, 2026 15:13

Merge branch 'main' into explicit-copy-memset-nodes

9451d5a

Improve comments

1a52336

Pre-commit

786eb82

Merge origin/main into explicit-copy-memset-nodes (prefer theirs)

21d7ebb

style: tighten outer-subset resolution comment in InsertExplicitCopies

f696321

No functional change. Trim the dst-relative-Memlet note in ``_lift_staging_edge`` to single-paragraph form, matching the comment density in ``copy_node.py``.

Improve register location detection

2e60b37

Updated how get_gpu_{backend, runtime}() worked.

aa0caba

That fixed some obscure cases: - It was not possible to change the backend, which only affect systems that have an Nvidia _and_ AMD GPU. - `lru_cache` is not needed `cache` is fully adequate.

The expansion of the copy library node always assumed CUDA and not Hi…

334614a

…p. Now they use the configured backend.

Removed a stray print().

f3f2a25

Fixed 1d memcopy (#2396)

a84c2d1

In case only a single element is copied the copy is always continuous. --------- Co-authored-by: Philip Muller <phimuell@santis>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explicit copy memset nodes#2380

Explicit copy memset nodes#2380
ThrudPrimrose wants to merge 73 commits into
mainfrom
explicit-copy-memset-nodes

ThrudPrimrose commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ThrudPrimrose commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants