feature: improve provenance and make q2-preview editable by gordonwoodhull · Pull Request #231 · quarto-dev/q2

gordonwoodhull · 2026-05-22T00:00:25Z

This is very early. Creating a draft PR for CI and in case anyone is curious.

Will push as things progress. The provenance epic is Plans 3-8 of the q2-preview sequence.

Current status: first plan proving idempotence of all built-in transforms and shortcodes.

One expected failure until Plan 5 is complete, due to a latent bug in the wire format.

Audit and revise Plans 3-8 of the q2-preview series (now framed internally as the provenance epic) after a design discussion that followed the q2-preview pipeline and attribution work landing on main. Major design changes folded into the plans: - **Plan 4 unified Generated variant.** Collapse the earlier `Synthetic` + `Derived` split into one `Generated { by, anchors: Vec<Anchor> }` shape. Atomicity is per-`by.kind` (orthogonal to anchors); the invocation source byte range is the first anchor with role `AnchorRole::Invocation`. One wire-format code (4) instead of two. - **Plan 4/5/6 typed anchors (Path C).** Instead of stuffing source-info chain metadata into `by.data` (dynamic JSON), the chain is a typed `Vec<Anchor>` where each `Anchor` carries an `Arc<SourceInfo>` and a role-labeled `AnchorRole` (`Invocation`, `ValueSource`, `Other(String)`). `by.data` shrinks to per-kind non-source-info configuration. Two future-anchor roles flagged as follow-ups contingent on metadata-loader and Lua-file-registration work. - **Plan 6 uniform shortcode anchor stamping.** Single funnel covers Rust built-ins, Lua-loaded extension handlers, and user-extension shortcodes uniformly via a post-walk `stamp_shortcode_anchors` helper. Enrichment-via-post-walk preserves Lua-attached `by.data` fields (lua_path, lua_line) while promoting `by.kind` to `shortcode`. Attribution interaction documented: multi-author shortcodes get latest-wins via the existing `query_byte_range` max-time logic composed with chain-walking through the `Invocation` anchor. - **Plan 5 latent code-3 bug now reachable.** Plans 1-2 shipped the q2-preview pipeline that runs filters whose output crosses the JSON boundary; the FilterProvenance code-3 round-trip bug is no longer latent in production. Added end-to-end production-reachability regression test using the `{{< kbd Ctrl+C >}}` fixture (kbd.lua constructs a Span that gets FilterProvenance-tagged and then shortcode-stamped). Drops code 5 from the design. - **Plan 7 SPA edit-back in scope.** The new q2 preview CLI command serves a separate SPA from ts-packages/preview-renderer; both hub-client and the SPA share the writer machinery via @quarto/preview-runtime. Plan 7 now covers replacing `noopSetAst` in the SPA with a real handler that routes through `incrementalWriteQmd` to `syncClient.updateFileContent` and the ephemeral hub's automerge↔disk bridge. Adds a small SPA-local `DiagnosticStrip` for Q-3-42/Q-3-43; hub-client's existing diagnostics-banner handles the same warnings there. Single-file mode (bd-tnm3k) works through the same automerge stack — no special case. - **Plan 8 wrapper stays Original.** Explicit reasoning added for why `CustomNode("IncludeExpansion")` uses Original source_info (CustomNode.type_name carries generator identity; the wrapper substitutes 1:1 for the source-mapped Paragraph). HTML pipeline resolve transform in the Normalization Phase (symmetric with CalloutResolveTransform); HTML doesn't attribute the include line because there's no DOM anchor for it — accepted v1 behavior. Mechanical changes also folded in: - Rename `Synthetic` → `Generated` throughout the type vocabulary in all plans. - Update JS-side hand-mirror file paths (`hub-client/src/utils/...` → `ts-packages/preview-renderer/src/utils/...`) to reflect the Phase-D package split. - Each plan's intro reframed as part of the provenance epic; file names keep the q2-preview-plan-N form for continuity. File renames for clarity about which filters each plan covers: - `…plan-3-filter-idempotence.md` → `…plan-3-builtin-filter-idempotence.md` - `…plan-7a-filter-idempotence.md` → `…plan-7a-user-filter-idempotence.md` Plans 3-8 remain in design state on this branch; no code changes yet.

Audit pass over the provenance epic's idempotence story, scoping Plan 3 to pipeline non-determinism only and propagating the consequences to the neighbouring plans. Plan 3 (builtin transform and filter idempotence): - Retitle to "Built-in transform and filter idempotence verification" — symmetric across Rust transforms and Lua filters (prior framing was too narrow). - Enumerate the actual universe under test: 36 Rust transforms in build_q2_preview_transform_pipeline (4 excluded, named with reasons), ~20 stage-level items in build_q2_preview_pipeline_stages, and the one Lua filter under resources/extensions/ (video-filter.lua). The prior "~10-20 filters" estimate misread shortcodes as filters. - Drop the "Plan 3 strengthening" round-trip amendment that was added alongside Plan 7a in commit 2129d35. Round-trip non-idempotence is not exercised by today's pipeline; CI-time round-trip testing conflates writer-lossiness with filter-non-idempotence; 7a's runtime check is the better home for the property when Plan 7's writer ships. Trim "Two flavors" section to a pointer at 7a. - Add compute_meta_hash_fresh / compute_meta_hash_fresh_excluding_rendered as a new helper in quarto-ast-reconcile, parallel to the existing block hasher. Hash covers blocks + meta (excluding rendered.*). - Rewrite test pseudocode against the real run_pipeline API at pipeline.rs:626. - Add fixture-format constraint: no executable engine cells (CI has no kernels). - Coverage gap audit: ~25 fixtures across the document-level, Lua shortcode, website-project, attribution, and resource categories. Includes lua-shortcode-version, lua-shortcode-lipsum-fixed (non-random path), and video-filter-header for the one built-in Lua filter. - Convert to a development-plan format with a seven-phase work-items checklist. - Close the engine-staleness open question via filter.rs:158 (fresh Lua::new() per invocation). - Clarify the lua-filter-pipeline reference as TypeScript Quarto porting material, not the Rust inventory. Plan 6 (provenance audit): - Add a §Test plan bullet for source_info determinism: Plan 3's hashes exclude source_info by design, so a per-fixture source_info-equality check is Plan 6's own responsibility. Plan 7 (incremental writer): - Add a writer-lossless baseline test as the first §Test plan bullet, prerequisite for the reconciler tests. Reuses Plan 3's fixture set. - Add Plan 3 to §References and §Dependencies (soft-depends-on via compute_meta_hash_fresh). Plan 7a (runtime user-filter idempotence): - Remove all references to the now-deleted "Plan 3 strengthening" section (five locations including a full subsection). - Reframe the out-of-scope bullet from "Strengthening Plan 3" to "Extending the runtime round-trip check to built-in filters," with three-point v1-acceptance reasoning in §Notes. - Update §Design decisions, §Dependencies, and §References to reflect the new shape and the shared compute_meta_hash_fresh helper. - Add the meta-hash comparison to step 4 of the round-trip check. No code changes; design state only.

…ailure policy Hash helper: `merge_op` participates (verified `MergeOp::default() = Concat` is a stable compile-time constant); `Map` entries hashed in insertion order, no sort (an idempotence test should *catch* the kind of HashMap-iteration-order non-determinism a sort would mask). Adds regression-guard unit tests for both choices. Test runner: drives every fixture through both `DriveMode::SingleFile` (direct `run_pipeline`) and `DriveMode::ProjectOrchestrator` (`ProjectPipeline<RenderToPreviewAstRenderer>`) so orchestrator-only non-determinism (project discovery, ProjectIndex assembly, file-iteration order) is also under test. Website/chrome fixtures are orchestrator-only by design. Failure policy: failing fixtures stay **failing** — no auto-`#[ignore]`. Each failure files a beads issue whose description doubles as a sub-agent investigation prompt. The integration branch holds the queue; merge to main waits until drained or the user explicitly opts to ignore. New helper `find_first_divergence` (alongside the hashers) returns `DivergencePoint::{Block { index }, MetaKey { path }, None}` so the test driver's panic message — and therefore the sub-agent prompt — arrives with a concrete starting point instead of just "hash diverged." Orchestrator-mode `DocumentAst` extraction: researched the data flow; the typed AST is materialized inside `render_qmd_to_preview_ast` but discarded after JSON serialization. Plan recommends adding `pub ast: DocumentAst` to `PreviewAstOutput` and forwarding through `WasmPassTwoOutput`; alternatives (JSON re-parse, test-only hook) documented with their costs. Fixture rules: no absolute process paths in fixture content (built-in extensions extract to a `temp_dir` whose path differs across CI runs; stable within a single process — fine for two-runs-compare, but a latent issue for future stored-snapshot variants). Smaller corrections: `Format::from_format_string("q2-preview")` (no `Format::q2_preview()` constructor exists); `apply_lua_filter` (singular) is the per-filter Lua-state-creation site, with the plural loop calling it once per filter; `LuaShortcodeEngine::new` is the shortcode-side analogue; `quarto/video` filter extension is built-in via `include_dir!(resources/extensions)` and auto-discovered by `StageContext::new`, so fixtures need no scaffolding beyond `filters: [video]` in YAML; `meta.rendered.includes.*` is the actual path (not `meta.includes.*`) and includes contributions from `IncludeResolveStage`, chrome render transforms, `attribution_viewer`, and Bootstrap/clipboard injection — all skipped by `compute_meta_hash_fresh_excluding_rendered`. Stage-inventory clarifications: `MathJsStage` is excluded from q2-preview; `BootstrapJsStage` and `ClipboardJsStage` write only to `ctx.artifacts` (not to `meta` or `blocks`), so they don't affect the hash — but their q2-preview inclusion is questionable and is filed separately as bd-2ag1c. Notes for the next traversal: `CodeHighlightStage`'s native disk scan for user grammars is OS-order-dependent (not exercised today; fixtures don't supply user grammars); lipsum's module-load `math.randomseed(os.time())` is harmless on the non-random code path the fixture exercises but should be reverified if a future variant routes through `math.random`. Estimated scope: ~760 → ~980 lines.

…branch policy Audit pass against current source. Settles every open question that remained in the prior revision and corrects factual drift. Reuse over rebuild - `DriveMode::ProjectOrchestrator` now delegates to the existing `render_active_page_preview` helper at `crates/quarto-core/tests/render_page_in_project.rs:660`. No fresh orchestrator wiring; no `make_website_project_ctx(...)` builder. - `DocumentAst` extraction settled on option (a): re-parse the JSON via `pampa::readers::json::read`. source_info round-trips but the hash excludes it, so no stripping pass and no production plumbing change is required. Earlier option (b) (typed-AST plumbing through `PreviewAstOutput` / `WasmPassTwoOutput`) abandoned. - `run_orchestrator` code sample updated: real body in place of the prior `unimplemented!("see Open questions")` stub. Test crate location pinned - File: `crates/quarto-core/tests/idempotence.rs`. - Fixtures: `crates/quarto-core/tests/fixtures/idempotence/`. - Cargo invocation in the sub-agent prompt template updated to `--test idempotence`. Long-lived branch policy made explicit - New `## Long-lived branch policy` section at the top. - `## Goal` clarifies that "CI-enforced" applies when the plan lands on `main`; until then `feature/provenance` is allowed to be red while the failure queue drains. - `### Phase 5 — Failure triage` opens with the same constraint. Factual fixes against current source - Transform count corrected from 36 to 37; missing `table-bootstrap-class` added to Finalization, with a fixture entry in the gap audit and Phase 4 checklist. - `Q2_PREVIEW_STAGE_EXCLUDED` corrected to list all three exclusions (`math-js`, `render-html-body`, `apply-template`). - `CodeHighlightStage` user-grammar scan citation moved from `pipeline.rs:644-650` to `crates/quarto-core/src/transforms/code_highlight.rs:126-129`. - Stale line numbers refreshed throughout (pipeline.rs 1181→1198, 1220→1237, 379→380, 355→356, 626→627, 855→859, 663→664; render_page_in_project.rs 653→660; Pass2Payload::AstJson 256→254; stage/context.rs 220→221; ShortcodeResolveTransform::transform 257→513 with the correct file path). - bd-2ag1c ordering pinned: Plan 3 lands first; bd-2ag1c follows with Plan 3's measurements in hand. Section rename: "Open questions for implementation" → "Decisions (was: open questions)" + a `### CI failure policy & sub-agent prompt template` subsection. All internal cross-refs updated. Estimate revised - Scaffolding line item: ~260 → ~100 lines (reuse, not rebuild). - `PreviewAstOutput::ast` plumbing (~20 lines) removed entirely. - Total: ~980 → ~800 lines. - Session count revised 2 → 2-3 with the third explicitly allocated to Phase 5 triage.

Adds the structural-hash infrastructure that Plan 3's q2-preview idempotence gate (and Plan 7a's runtime user-filter check) will sit on: - compute_meta_hash_fresh: source-info-agnostic ConfigValue hasher. Insertion-order Map keys (no sort, so HashMap-iteration-order bugs in transforms remain detectable). MergeOp participates via its enum discriminant. Recurses into PandocInlines/PandocBlocks via the existing inline/block hashers (which already exclude source_info). - compute_meta_hash_fresh_excluding_rendered: same, but skips the top-level `rendered` map entry. The exclusion is intentionally not propagated into recursion: a nested `rendered` key is content. - find_first_divergence + DivergencePoint: returns the first block index whose per-block fresh hash differs, or the first insertion- order meta key path whose subtree hash differs (with the same rendered.* exclusion). The plan-sketch signature took &DocumentAst, but quarto-ast-reconcile cannot depend on quarto-core; the helper takes &[Block] + &ConfigValue and the test driver projects from DocumentAst. - 11 new unit tests cover: same/different content, source_info/ key_source agnosticism, top-level rendered exclusion, nested rendered participation, Map insertion-order sensitivity (no-sort regression guard), MergeOp sensitivity; identical/Block-mismatch/ MetaKey-path/rendered-skip divergence localization. Verification: `cargo nextest run --workspace` — 9321 passed, 196 skipped. `cargo xtask verify --skip-hub-build` steps 1–5 green (lint, fmt, Rust build with -D warnings, tree-sitter, Rust tests with -D warnings). Steps 7/10 fail with the known --skip-hub-build artifact (`wasm-quarto-hub-client` unbuilt), unrelated to these additive Rust changes. Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md

Adds the test driver that Phases 3-4 will hang ~25 fixtures off. Self-contained at `crates/quarto-core/tests/idempotence.rs`. - `DriveMode { SingleFile, ProjectOrchestrator }`. Single-file calls `run_pipeline` with `build_q2_preview_pipeline_stages`. Orchestrator drives `ProjectPipeline<RenderToPreviewAstRenderer>` via the existing `render_active_page_preview` body (copied inline because each `tests/*.rs` is its own binary). - `Fixture { name, setup, active, modes }` + `run_fixture` runs the pipeline twice per (fixture, mode), hashes blocks via `compute_blocks_hash_fresh` and meta via `compute_meta_hash_fresh_excluding_rendered`, and on divergence panics with `find_first_divergence`'s `DivergencePoint` embedded so the panic message itself fills the plan's sub-agent investigation prompt template. - `pandoc_to_document_ast` is the small field-shuffle that the plan identifies: orchestrator mode emits `Pass2Payload::AstJson`, which `pampa::readers::json::read` re-parses into `(Pandoc, ASTContext)`; the hasher only reads `ast.blocks` + `ast.meta` so the other `DocumentAst` fields get defaults. - `tests/fixtures/idempotence/README.md` documents the fixture-format rules (no engine cells, no absolute paths, per-fixture mode mapping). - `smoke_plain_paragraph` smoke fixture drives a single-paragraph document through both modes. Passing this proves the harness works end-to-end before Phases 3-4 land the real fixtures. Verification: `cargo nextest run -p quarto-core --test idempotence` runs the new smoke test (PASS). `cargo xtask verify --skip-hub-build --skip-hub-tests` steps 1-9 green; the Phase-1 idempotence tests and this Phase-2 smoke test ran inside Step 5. Step 10 (preview-renderer integration tests in `ts-packages/preview-renderer/`) fails with the same WASM-import artifact as Step 7 — both depend on `wasm-quarto-hub-client` which `--skip-hub-build` skips. Unrelated to these Rust-only additions. Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md

Adds the existing-fixture batch the plan calls "carry-forward from prior plan draft": one fixture per Rust transform / feature that was already exercised in earlier idempotence drafts, scoped to single-file document fixtures that run in both DriveMode variants. Coverage: - meta-single, meta-markdown — shortcode-resolve + metadata-normalize (string and PandocInlines branches). - include-trivial — include-expansion stage + shortcode-resolve. - callout-warning — CalloutTransform (callout-resolve is excluded from q2-preview, so the CustomNode survives). - theorem — TheoremSugarTransform. - figure-ref-target — FloatRefTargetSugarTransform. - crossref-to-theorem — crossref-index + crossref-resolve. - sectionize-multi — SectionizeTransform across nested headers. - footnotes-mixed — FootnotesTransform on inline + reference forms. - appendix-license — AppendixStructureTransform with license/ copyright meta and a footnote interaction. - combined-stress — sectionize + callouts + shortcodes interacting. A `doc_fixture(name, content)` helper collapses each single-file fixture to a one-liner; `include-trivial` keeps an inline closure because it writes two files. All 12 idempotence tests (smoke + 11 new) pass: `cargo nextest run -p quarto-core --test idempotence` → 12 passed. No queue entries for Phase 5 from this batch — the carry-forward fixtures are all clean on first run. Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md

npm install (from repo root) and npm run build:wasm (from hub-client) updated package-lock.json and crates/wasm-quarto-hub-client/Cargo.lock on this branch. Committed so subsequent fresh checkouts of feature/provenance can build WASM from the same dependency set.

Adds the batch of Phase-4 fixtures that need no scaffolding beyond a single-file `setup`. Per the long-lived-integration-branch policy, fixtures that surface non-idempotence stay in the suite as the triage queue. Pass on first run (both DriveModes): - code-block-fenced — code-block-generate / -render / code-highlight. - proof — ProofSugarTransform. - equation-labeled — EquationLabelTransform + crossref-resolve (eq). - toc-on — toc-generate, toc-render. - video-filter-header — built-in Lua filter under `resources/extensions/quarto/video/`. - theme-bootstrap — compile-theme-css stage. - table-bootstrap-class — TableBootstrapClassTransform. - lua-shortcode-version — Lua-loaded shortcode handler (returns `quarto.version`). In the queue: - **lua-shortcode-lipsum-fixed**: `SingleFile` passes; the pipeline itself is idempotent. `ProjectOrchestrator` panics with `MalformedSourceInfoPool` re-parsing the AST JSON the orchestrator emitted. This is a JSON writer/reader round-trip bug specific to lipsum-shortcode-generated inlines, not a transform-determinism finding. Filed as **bd-3odjm**. The test stays red per the plan's "do not #[ignore]" rule; the integration branch is allowed to carry the failure until the queue is drained. Verification: `cargo nextest run -p quarto-core --test idempotence` → 20 passed, 1 failed (bd-3odjm). Plan-1 unit tests and Phase-3 fixtures all green. Refs: - claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md - bd-3odjm

Both pass on first run in both DriveMode variants. - include-in-header writes a tiny header.html and references it from front matter; exercises IncludeResolveStage. - resource-image writes a 67-byte minimal PNG and references it via inline image syntax; exercises ResourceCollectorTransform. Adds a write_bytes helper for the binary stub. Per the fixtures README rule the PNG sits at the project root and is referenced relatively (`./local.png`). Verification: `cargo nextest run -p quarto-core --test idempotence` → 22 passed, 1 failed (bd-3odjm).

Three orchestrator-only website fixtures. Two pass, one in queue. Pass: - website-chrome — navbar + sidebar + page-navigation + page-footer + favicon + bootstrap-icons + canonical-url + title-prefix. Two pages (index, other), tiny favicon stub. - website-listing — listing with categories enabled and feed: true, two posts under posts/, each with categories. Exercises listing-generate / -render, categories-sidebar, listing-feed-link, listing-feed-stage, listing-item-info. In the queue: - website-links — internal cross-page `.qmd` body links. Filed as bd-rz2we. Block 0 hash diverges across runs while meta hash is stable, so the divergence is genuinely in the AST blocks (not in rendered chrome). Hypothesis: link-rewrite or link-resolution is capturing the absolute project root (or canonicalized tempdir path) into the AST when it should emit a path-independent relative URL. Verification: `cargo nextest run -p quarto-core --test idempotence` → 24 passed, 2 failed (bd-3odjm, bd-rz2we). Refs: - claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md - bd-rz2we

Extends Fixture with an optional attribution_json: Option<&'static str>. When present: - SingleFile installs PreBuiltAttributionProvider on RenderContext.attribution_provider before run_pipeline. - ProjectOrchestrator forwards the JSON via RenderToPreviewAstRenderer::with_attribution; the renderer installs the same provider type on the per-page RenderContext it constructs internally. Stub JSON has one actor + one run covering bytes 0..1024 (a wider range than the fixture body actually uses) so the attribution map overlaps the entire document and AttributionGenerateStage + AttributionRenderTransform have something to write into the AST. `cargo nextest run -p quarto-core --test idempotence` → 25 passed, 2 failed (bd-3odjm, bd-rz2we — both pre-existing). attribution_basic passes on first run in both DriveModes, so the deterministic provider + generate + render stack is genuinely idempotent. This completes the Phase 4 fixture set. The Plan-3 gate now covers: - 1 smoke fixture - 11 carry-forward (Phase 3, all green) - 9 Phase-4a doc fixtures (8 green, 1 in queue) - 2 Phase-4b multi-file (both green) - 3 Phase-4c website (2 green, 1 in queue) - 1 Phase-4d attribution (green) Total: 27 fixtures, 25 green, 2 in queue. Refs: - claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md - bd-3odjm (Plan 5 will fix), bd-rz2we

Adds claude-notes/instructions/idempotence-contract.md — the author-facing summary of the contract Plan 3 enforces. Covers: - what the hash includes and excludes (source-info blind, insertion-order maps, merge_op participates, rendered.* excluded at top level only); - what new transforms must NOT do (undefined iteration order, process-local state, absolute paths, engine cells); - the fresh-Lua-state-per-run rule for Lua filters / shortcodes; - how to add a fixture (doc_fixture for trivial, inline closure for multi-file, ORCHESTRATOR_ONLY for chrome, attribution_json for attribution exercises); - the long-lived-integration-branch policy: don't #[ignore] a failing fixture without explicit user approval. Cross-linked from: - crates/quarto-core/tests/fixtures/idempotence/README.md (existing pointer expanded to point at the contract doc and the plan). - claude-notes/plans/2026-05-04-q2-preview-plan-7a-user-filter-idempotence.md (References section — authors looking at the runtime user-filter check find the CI contract too). Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md

cargo nextest run --workspace: 9346/9348 pass. The 2 failures are the documented queue items (bd-3odjm, bd-rz2we); every other workspace test is green, including the 25 passing idempotence fixtures. cargo xtask verify (full WASM stack): Steps 1-4 green; Step 5 fails on the same 2 fixtures. That's the expected long-lived- integration-branch state per the plan's §Long-lived branch policy — the gate is allowed to be red until the queue is drained. Plan 3 is complete as a deliverable: gate + hashing infrastructure + 27 fixtures + author-facing docs + filed queue. Merge to main gated on draining the queue (bd-3odjm via Plan 5; bd-rz2we via a follow-up). Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md

The Work-items section under Phase 1-7 was fully checked, but the parallel "Coverage gaps to address during implementation" inventory (per-fixture bullets, line ~560+) still showed unchecked boxes even though every fixture in that list now ships in idempotence.rs. Marked all 26 inventory items as landed. Annotated the two that are in the Phase-5 triage queue (lipsum-fixed → bd-3odjm, website-links → bd-rz2we) so the queue state is also visible from the inventory, not just from the Phase-5 work-items block. Plan checklist is now fully consistent: 54 checked, 0 unchecked.

…erContext Plan 3's website_links fixture was non-idempotent: rendered AST link URLs captured the absolute tempdir path of the per-run TempDir, causing block-0 hash divergence across two runs with different tempdirs. Root cause: `ResourceResolverContext::vfs_root_mode` played two roles via a single PathBuf — disk-write root (where runtime.file_write puts theme CSS / copied resources) and URL prefix (what gets embedded in HTML link/asset URLs). In production WASM these are intentionally identical; on native they have to diverge so writes hit a real tempdir but URLs stay path-independent. Split the field into `{ write_root, url_root }` and add a two-arg `vfs_root_with_url_root` constructor plus per-renderer `with_url_root` builder. Single-arg `vfs_root(...)` constructor preserves the WASM identity contract by construction (write_root == url_root). Native test helpers in tests/idempotence.rs and tests/render_page_in_project.rs now pass `.with_url_root("/.quarto/project-artifacts")`, so rendered URLs embed the synthetic prefix while disk writes still land in the tempdir. website_links now passes; 25/26 idempotence fixtures pass. The remaining lipsum failure is bd-3odjm (FilterProvenance wire format), owned by Plan 5 and out of scope here. Workspace nextest: 9347/9348. cargo xtask verify (Rust leg) clean for lint/fmt/build with -D warnings. Plan: claude-notes/plans/2026-05-21-vfs-url-write-root-split.md

Plan 4 (SourceInfo provenance types) finalized for development: - 7-phase work-items checklist (types → constructors → accessor updates → Lua serde → migration → tests → verification gate) - field renamed `anchors` → `from` (typed `SmallVec<[Anchor; 1]>` from day 1; serde feature required on smallvec) - accessor semantics for `Generated` pinned: length/start_offset/ end_offset → 0, map_offset → None, resolve_byte_range / remap_file_ids / extract_file_id delegate to invocation_anchor - required-Invocation-anchor invariant on `shortcode` kind documented with `By::shortcode` doc-comment requirement; enforcement split across Plan 6 audit test and Plan 7 debug_assert - Lua-table discriminant pinned to `t = "Generated"` - §Test plan and Phase 6 expanded to cover every accessor + mutator + the `combine()` × Generated corner - migration scope corrected (15 files, 27 occurrences); references and line ranges verified against the worktree source - §Open questions section removed (no open questions remain) Cross-plan `from` rename swept across Plans 3, 5, 6, 7, 8. Plan 5 JSON wire format (option D): - outer JSON key `anchors` → `from` (matches Rust field name) - inner anchor pool reference `from` → `si_id` (distinctive; avoids the `parent_id` tree-structure mental model that fits Substring's chain but not anchor references) - Reader/writer code samples updated; TS-side `SourceInfoEntry` shape note updated Plan 6 + Plan 7 hand-offs for the required-anchor invariant added. Deferred follow-ups (Dispatch anchor, ValueSource anchor) cross- referenced as bd-36fr9 and bd-129m3 (committed separately to main).

Plan 4 work happens on top of an integration branch carrying exactly one failing test (lua_shortcode_lipsum_fixed orchestrator mode, filed as bd-3odjm). That test's root cause is the wire-format code-3 collision Plan 5 owns, so Plan 4 must not try to fix it locally. Plan 4: - New §"Inherited pre-existing failure (bd-3odjm)" section between Out of scope and Work items. Explains the test, the panic shape, the root cause, and that any *other* failure in the idempotence suite is a Plan-4 regression. - Phase 7 verification gate updated: cargo nextest expects exactly one failure (bd-3odjm); cargo xtask verify trips on the same one. Plan 5: - New §"Inherited failure that must close on Plan 5's first reader change (bd-3odjm)" section. Spells out the contract: Plan 5's first reader change must turn lua_shortcode_lipsum_fixed green. If it doesn't, the Plan-5 author has an immediate signal that either the reader discrimination is wrong or the lipsum path produces a code-3 shape neither arm handles — stop and focus on it before moving on. - Test plan now cites bd-3odjm as the live first-iteration smoke check, ahead of the hand-constructed tests. Both plans now read consistently with the state of feature/provenance.

Plan 4 committed `from: SmallVec<[Anchor; 1]>` as the field type, but Plan 5's reader/writer + Plan 6's stamper code samples still used the `vec![]` macro to construct it. Those samples would not compile if taken literally — `vec!` produces a `Vec`, not a `SmallVec`. Switch to `smallvec![]` everywhere `Generated.from` is constructed: - Plan 5: 4 occurrences (legacy-Transformed code-3 reader; Anchor dedup test description; forward-compat test description; round- trip test description). - Plan 6: 14 occurrences across §"Per-transform fixes", §"Lua-shortcode enrichment", §"The post-walk helper", §"Variant semantics summary" etc. No semantic change — same constructions, just the macro that actually returns the field type.

Plan 4 + Plan 5: change Generated.from's inline capacity from SmallVec<[Anchor; 1]> to SmallVec<[Anchor; 2]> so the steady-state post-follow-up shape (Invocation + ValueSource on meta/var; Invocation + Dispatch on Lua-handler shortcodes) stays heap-free. Cost is +16 bytes per empty Generated; saves a heap allocation on every multi-anchor shortcode resolution. Also folds in research findings that were tacit in the previous draft: - Phase 1 smallvec line: replace "or verify present" hedge with the concrete two-file Cargo.toml edit (workspace + quarto-source-map), noting verified-absent. - skip_serializing_if path: use the fully-qualified serde_json::Value::is_null (the short form is a frequent gotcha). - By::raw policy: accept-all; forgery caught by Plan 6 audit + Plan 7 debug_assert, not by constructor rejection. - Anchor ordering: append order, stable across serde, at most one anchor per known role. - extract_file_id: empty-from Generated returns None, matching FilterProvenance's behavior; both call sites in to_ariadne_report already tolerate None. Stays a private fn on DiagnosticMessage. - Lua serde Concat recursion: legacy "FilterProvenance" inside a Concat piece is handled automatically; no .snap/.json fixtures contain the legacy tag. - Default risk: no struct holding SourceInfo derives Default in quarto-pandoc-types; Default for SourceInfo itself stays unchanged. - combine() × Generated: verified unreachable today (all 17 call sites combine Original/Substring shapes); the Phase 6 test documents intent for any future caller. - PartialEq: no production call site compares SourceInfo today; the derive is required by Block/Inline but not load-bearing.

gordonwoodhull added 20 commits May 21, 2026 12:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: improve provenance and make q2-preview editable#231

feature: improve provenance and make q2-preview editable#231
gordonwoodhull wants to merge 20 commits into
mainfrom
feature/provenance

gordonwoodhull commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gordonwoodhull commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gordonwoodhull commented May 22, 2026 •

edited

Loading