You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/plan.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -256,10 +256,10 @@ All estimates are rough; refine when a task starts.
256
256
-[ ] T2.1b zerfoo: switch `graph/cuda_graph.go:beginCapture` to use `WithCapture`. Owner: TBD. Est: 45m. verifies: [UC-002]
257
257
- Acceptance: Existing zerfoo GGUF inference tests still pass; gemma4e and gemma3 parity suites unchanged.
258
258
- Dependencies: T2.1a, ztensor version bump merged.
259
-
-[] T2.2 Introduce a `managedMem` guard in `allocWeight` that routes to `cudaMallocAsync` on the capture stream when `CaptureAwareAllocator` is active. Otherwise fall back to `MallocManaged`. Owner: TBD. Est: 90m. verifies: [UC-002]
259
+
-[x] T2.2 Introduce a `managedMem` guard in `allocWeight` that routes to `cudaMallocAsync` on the capture stream when `CaptureAwareAllocator` is active. Otherwise fall back to `MallocManaged`. Owner: task-T2.2. Est: 90m. verifies: [UC-002] Completed: 2026-04-16
260
260
- Acceptance: Unit test with a mocked capture stream records an async-alloc node instead of a sync call.
261
261
- Dependencies: T2.1a.
262
-
-[] T2.3 Pre-allocate workspace buffers used by `MatMul`, `Add`, and `RMSNorm` variants at `UploadWeights` time so no lazy alloc occurs inside capture for dense float32 workloads. Owner: TBD. Est: 3h. verifies: [UC-001, UC-002]
262
+
-[x] T2.3 Pre-allocate workspace buffers used by `MatMul`, `Add`, and `RMSNorm` variants at `UploadWeights` time so no lazy alloc occurs inside capture for dense float32 workloads. Owner: task-T2.3. Est: 3h. verifies: [UC-001, UC-002] Completed: 2026-04-16
263
263
- Acceptance: Instrument with a counter; capture region records zero `allocWeight` calls for the CrossAsset workload.
264
264
- Dependencies: T1.3, T2.1a.
265
265
-[ ] T2.4 Add unit and integration tests for T2.1 to T2.3. Owner: TBD. Est: 90m. verifies: [infrastructure]
@@ -351,8 +351,8 @@ count equals the number of task IDs listed on that wave.
351
351
#### Wave 4: Fix + fallback in parallel (4 agents)
0 commit comments