Context
IPPL currently captures expression-template objects for Kokkos kernels through a byte-buffer wrapper:
using capture_type = detail::CapturedExpression<E, N>;
capture_type expr_ = reinterpret_cast<const capture_type&>(expr);
CapturedExpression then reinterprets its own storage back to E during evaluation.
A potential weakness: the byte buffer was not aligned to E, which can cause cudaErrorMisalignedAddress when expressions contained stricter-aligned members such as Kokkos::View. The immediate fix was to align CapturedExpression and its buffer to E.
Remaining concern
Even with alignment fixed, the current model still relies on raw object-representation copying and reinterpret-casting. This assumes:
CapturedExpression has compatible alignment with E
the CRTP (Curiously Recurring Template Pattern) base Expression<E, N> starts at the same address as the full E
byte-copying E is valid for all expression types no expression type requires meaningful copy construction or lifetime handling
These assumptions are fragile, especially for CUDA/HIP device kernels and future expression types.
Proposed direction
Replace the byte-buffer wrapper with typed ownership of the concrete expression:
struct CapturedExpression {
constexpr static unsigned dim = E::dim;
KOKKOS_FUNCTION
explicit CapturedExpression(const Expression<E, N>& expr)
: expr_m(static_cast<const E&>(expr)) {}
template <typename... Args>
KOKKOS_INLINE_FUNCTION auto operator()(Args... args) const {
return expr_m(args...);
}
E expr_m;
};
Then update call sites from:
capture_type expr_ = reinterpret_cast<const capture_type&>(expr);
to:
capture_type expr_(expr);
Validation needed
This affects all expression-template assignment paths, including fields, indexed fields, sparse indexed fields, FEM vectors, particles, and sparse index predicate construction.
Context
IPPL currently captures expression-template objects for Kokkos kernels through a byte-buffer wrapper:
CapturedExpression then reinterprets its own storage back to E during evaluation.
A potential weakness: the byte buffer was not aligned to E, which can cause cudaErrorMisalignedAddress when expressions contained stricter-aligned members such as Kokkos::View. The immediate fix was to align CapturedExpression and its buffer to E.
Remaining concern
Even with alignment fixed, the current model still relies on raw object-representation copying and reinterpret-casting. This assumes:
CapturedExpression has compatible alignment with E
the CRTP (Curiously Recurring Template Pattern) base
Expression<E, N>starts at the same address as the full Ebyte-copying E is valid for all expression types no expression type requires meaningful copy construction or lifetime handling
These assumptions are fragile, especially for CUDA/HIP device kernels and future expression types.
Proposed direction
Replace the byte-buffer wrapper with typed ownership of the concrete expression:
Then update call sites from:
capture_type expr_ = reinterpret_cast<const capture_type&>(expr);to:
capture_type expr_(expr);Validation needed
This affects all expression-template assignment paths, including fields, indexed fields, sparse indexed fields, FEM vectors, particles, and sparse index predicate construction.