diff --git a/README.md b/README.md index a27508a6..daaeaf4e 100644 --- a/README.md +++ b/README.md @@ -235,14 +235,24 @@ provider-backed ELF evidence was required. ambiguous, stale, and superseded markers without introducing a separate graph database or replacing source evidence. - Recall/debug panel after XY-1022: the June 20 follow-up adds - `elf.recall_debug_panel/v1` through service, HTTP, and MCP readback. The panel - groups Memory Note trace selected rows and retained dropped replay candidates, - Source Library document candidates, Knowledge Workspace page snippets, graph facts, - and Dreaming proposals with - authority layer, freshness state, source refs, stage reason, evidence class, and - replay command. Missing anchors remain explicit `not_requested` layers, so the - panel improves debug ergonomics without turning untested or blocked layers into - pass claims. + `elf.recall_debug_panel/v1` through service, HTTP, and MCP readback. The panel + groups Memory Note trace selected rows and retained dropped replay candidates, + Source Library document candidates, Knowledge Workspace page snippets, graph facts, + and Dreaming proposals with + authority layer, freshness state, source refs, stage reason, evidence class, and + replay command. Missing anchors remain explicit `not_requested` layers, so the + panel improves debug ergonomics without turning untested or blocked layers into + pass claims. +- Agent Knowledge OS closeout after XY-1023: the June 20 closeout report publishes + the full product/scenario matrix for 19 tracked products and six Agent Knowledge OS + layers, after rerunning `cargo make real-world-memory` at 62 jobs, 55 pass, + 0 wrong_result, and 7 typed blockers. ELF is the strongest measured integrated + Agent Knowledge OS product because all six ELF-owned layers have checked-in + evidence, but the report preserves qmd + retrieval/debug ergonomics, OpenViking trajectory, mem0/OpenMemory history and + UI/export, Letta core/archive, graph/RAG temporal-citation, agentmemory/claude-mem + capture/viewer, and VectifyAI PageIndex/OpenKB long-document knowledge-library + advantages as optimization inputs rather than false pass claims. - Operator-approved public-proxy addendum after XY-930: the June 19 follow-up runs `cargo make baseline-production-private-addendum` with a simulated/public-proxy production corpus manifest approved for this stage. The run records 12 documents, @@ -373,6 +383,7 @@ Detailed evidence and interpretation: - [Graph Topic-Map Report - June 20, 2026](docs/evidence/benchmarking/2026-06-20-graph-topic-map-report.md) - [Knowledge Workspace Version-Diff Report - June 20, 2026](docs/evidence/benchmarking/2026-06-20-knowledge-workspace-version-diff-report.md) - [Live Knowledge-Page Rebuild/Lint Report - June 20, 2026](docs/evidence/benchmarking/2026-06-20-live-knowledge-page-rebuild-lint-report.md) +- [Agent Knowledge OS Closeout Benchmark Report - June 20, 2026](docs/evidence/benchmarking/2026-06-20-agent-knowledge-os-closeout-benchmark-report.md) - [Live Baseline Benchmark Runbook](docs/runbook/benchmarking/live_baseline_benchmark.md) - [Single-User Production Runbook](docs/runbook/single_user_production.md) - Benchmark contract: @@ -474,9 +485,9 @@ Detailed comparison, mechanism-level analysis, and source map: - [Dreaming Product Surface Follow-Up Research](docs/research/dreaming_product_surface_followup.md) Latest real-world benchmark report: June 20, 2026. Latest external research refresh: -June 11, 2026; June 20 adds the Graph Topic-Map Report - June 20, 2026, -Knowledge Workspace Version-Diff Report - June 20, 2026, and the Live -Knowledge-Page Rebuild/Lint Report - June 20, 2026 after the June 19 +June 11, 2026; June 20 adds the Agent Knowledge OS Closeout Benchmark Report, +the Graph Topic-Map Report - June 20, 2026, Knowledge Workspace Version-Diff +Report - June 20, 2026, and the Live Knowledge-Page Rebuild/Lint Report - June 20, 2026 after the June 19 XY-930 operator-approved public-proxy production addendum and service-native Dreaming readback, the qmd debug-ergonomics Dreaming retest, the June 17 competitor-strength closeout, and the June 16 temporal reconciliation, live consolidation self-check, diff --git a/apps/elf-eval/fixtures/report_snapshots/2026-06-20-agent-knowledge-os-closeout-benchmark-report.json b/apps/elf-eval/fixtures/report_snapshots/2026-06-20-agent-knowledge-os-closeout-benchmark-report.json new file mode 100644 index 00000000..5cc01771 --- /dev/null +++ b/apps/elf-eval/fixtures/report_snapshots/2026-06-20-agent-knowledge-os-closeout-benchmark-report.json @@ -0,0 +1,521 @@ +{ + "schema": "elf.agent_knowledge_os_closeout_benchmark_report/v1", + "authority": "XY-1023", + "source_contract": "elf-agent-knowledge-os-2026-06-20", + "source_issue": "XY-286", + "generated_at": "2026-06-20T00:00:00Z", + "report_markdown": "docs/evidence/benchmarking/2026-06-20-agent-knowledge-os-closeout-benchmark-report.md", + "all_project_fixture_rerun": { + "command": "cargo make real-world-memory", + "status": "pass", + "run_id": "real-world-memory", + "job_count": 62, + "encoded_suite_count": 17, + "pass": 55, + "wrong_result": 0, + "incomplete": 0, + "blocked": 7, + "not_encoded": 0, + "evidence_coverage": 1.0, + "source_ref_coverage": 1.0, + "quote_coverage": 1.0, + "mean_score": 0.887 + }, + "summary": { + "strongest_measured_integrated_product": "ELF integrated Agent Knowledge OS", + "strongest_product_qualification": "ELF is the strongest measured integrated product in the checked-in Agent Knowledge OS matrix because all six ELF-owned layers now have source-linked readback, tests, and docs evidence. This is not a broad claim that ELF beats every competitor on every competitor-owned specialty.", + "complete_same_corpus_product_count": 1, + "product_count": 19, + "scenario_count": 6, + "matrix_cell_count": 114, + "coverage_statement": "Only ELF has complete same-repo coverage across all six Agent Knowledge OS layers. qmd, agentmemory, OpenViking, mem0/OpenMemory, claude-mem, memsearch, Letta, graph/RAG projects, llm-wiki, gbrain, PageIndex, and OpenKB have partial, blocked, or reference-only coverage depending on the scenario.", + "not_every_product_has_complete_live_coverage": true, + "evidence_class_counts": { + "pass": 9, + "wrong_result": 7, + "incomplete": 6, + "blocked": 14, + "not_tested": 78 + } + }, + "supported_scenarios": [ + { + "id": "source_library_ingest_hydration", + "layer": "Source Library", + "elf_evidence": "docs/evidence/benchmarking/2026-06-20-recall-debug-panel-report.md and source-library fixture coverage", + "completion_rule": "Captured sources remain documents with source refs, hydration, search, and bounded replay; they are not silently converted into authoritative memory notes." + }, + { + "id": "memory_authority_history_read_profiles", + "layer": "Memory Authority", + "elf_evidence": "docs/evidence/benchmarking/2026-06-16-live-temporal-reconciliation-report.md and service-native Dreaming readback", + "completion_rule": "Current, historical, tombstoned, stale, superseded, selected, and dropped evidence stays explicit under read-profile policy." + }, + { + "id": "knowledge_workspace_pages", + "layer": "Knowledge Workspace", + "elf_evidence": "docs/evidence/benchmarking/2026-06-20-live-knowledge-page-rebuild-lint-report.md and 2026-06-20-knowledge-workspace-version-diff-report.md", + "completion_rule": "Knowledge pages are rebuildable derived artifacts with source refs, lint, version diffs, and no hidden source-of-truth mutation." + }, + { + "id": "temporal_topic_graph_lite", + "layer": "Graph-lite Facts", + "elf_evidence": "docs/evidence/benchmarking/2026-06-20-graph-topic-map-report.md", + "completion_rule": "Graph facts are typed Postgres readback over sourced, inferred, ambiguous, stale, and superseded markers, not a second graph database authority." + }, + { + "id": "dreaming_review_queue", + "layer": "Dreaming Review Queue", + "elf_evidence": "docs/evidence/benchmarking/2026-06-20-dreaming-review-queue-report.md", + "completion_rule": "Background proposals stay reviewable with source refs, affected refs, lint, diff, policy, and review audit before high-impact mutation." + }, + { + "id": "recall_debug_panel", + "layer": "Recall and Debug", + "elf_evidence": "docs/evidence/benchmarking/2026-06-20-recall-debug-panel-report.md", + "completion_rule": "Selected, dropped, available, reviewable, not_requested, and blocked rows remain visible across memory, docs, pages, graph facts, and proposals." + } + ], + "product_matrix": [ + { + "product": "ELF", + "coverage": "complete_same_repo", + "statuses": { + "source_library_ingest_hydration": "pass", + "memory_authority_history_read_profiles": "pass", + "knowledge_workspace_pages": "pass", + "temporal_topic_graph_lite": "pass", + "dreaming_review_queue": "pass", + "recall_debug_panel": "pass" + }, + "strongest_advantage": "Policy-gated, source-linked, replayable memory and knowledge authority across all six layers.", + "remaining_gap": "Needs a richer operator UI and more same-corpus adapters for competitor specialties.", + "evidence": "README checked-in live benchmark snapshot plus June 20 component reports." + }, + { + "product": "qmd", + "coverage": "partial_same_corpus", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "wrong_result", + "knowledge_workspace_pages": "wrong_result", + "temporal_topic_graph_lite": "not_tested", + "dreaming_review_queue": "wrong_result", + "recall_debug_panel": "wrong_result" + }, + "strongest_advantage": "Local retrieval pipeline, query expansion, weighted fusion, rerank, short CLI replay, and top-k debug ergonomics.", + "remaining_gap": "Does not expose ELF-style source-authority, dropped-candidate, service trace, page rebuild, graph-lite, or review queue semantics in the current adapter.", + "evidence": "docs/evidence/benchmarking/2026-06-19-qmd-debug-ergonomics-dreaming-retest-report.md" + }, + { + "product": "agentmemory", + "coverage": "partial_same_corpus", + "statuses": { + "source_library_ingest_hydration": "incomplete", + "memory_authority_history_read_profiles": "incomplete", + "knowledge_workspace_pages": "not_tested", + "temporal_topic_graph_lite": "not_tested", + "dreaming_review_queue": "blocked", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Cross-agent hooks, MCP/REST packaging, local viewer, and coding-agent continuity workflow references.", + "remaining_gap": "Durable storage, lifecycle/cold-start proof, and capture breadth remain incomplete or blocked in the checked-in evidence.", + "evidence": "docs/evidence/benchmarking/2026-06-11-first-generation-oss-continuity-source-store-report.md" + }, + { + "product": "OpenViking", + "coverage": "partial_same_corpus", + "statuses": { + "source_library_ingest_hydration": "wrong_result", + "memory_authority_history_read_profiles": "blocked", + "knowledge_workspace_pages": "not_tested", + "temporal_topic_graph_lite": "not_tested", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Filesystem-like context URIs, hierarchy, staged trajectory, and recursive context expansion as a product idea.", + "remaining_gap": "Same-corpus staged trajectory, hierarchy selection, and recursive expansion stay blocked until evidence-bearing stage artifacts exist.", + "evidence": "docs/evidence/benchmarking/2026-06-19-openviking-trajectory-materialization-report.md" + }, + { + "product": "mem0/OpenMemory", + "coverage": "partial_same_corpus", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "pass", + "knowledge_workspace_pages": "not_tested", + "temporal_topic_graph_lite": "blocked", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "blocked" + }, + "strongest_advantage": "Entity-scoped memory history, hosted ecosystem, OpenMemory UI/export direction, and optional graph memory concept.", + "remaining_gap": "OpenMemory product UI/export remains blocked locally; hosted Platform and optional graph memory are non-goals for this lane.", + "evidence": "docs/evidence/benchmarking/2026-06-19-openmemory-ui-export-product-readback-report.md" + }, + { + "product": "claude-mem", + "coverage": "partial_reference", + "statuses": { + "source_library_ingest_hydration": "blocked", + "memory_authority_history_read_profiles": "incomplete", + "knowledge_workspace_pages": "not_tested", + "temporal_topic_graph_lite": "not_tested", + "dreaming_review_queue": "blocked", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Progressive disclosure UX, local inspection, and automatic capture-loop reference behavior.", + "remaining_gap": "Hook/viewer capture proof is still blocked in Docker-contained checked-in evidence.", + "evidence": "docs/evidence/benchmarking/2026-06-11-first-generation-oss-continuity-source-store-report.md" + }, + { + "product": "memsearch", + "coverage": "partial_same_corpus", + "statuses": { + "source_library_ingest_hydration": "pass", + "memory_authority_history_read_profiles": "pass", + "knowledge_workspace_pages": "not_tested", + "temporal_topic_graph_lite": "not_tested", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Markdown-first canonical store, incremental reindex, and practical hybrid retrieval workflow.", + "remaining_gap": "Broader prompt behavior, TTL, knowledge-page rebuild, graph, and review queue coverage are not encoded.", + "evidence": "docs/evidence/benchmarking/2026-06-11-first-generation-oss-continuity-source-store-report.md" + }, + { + "product": "Letta", + "coverage": "blocked_materialization", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "blocked", + "knowledge_workspace_pages": "not_tested", + "temporal_topic_graph_lite": "not_tested", + "dreaming_review_queue": "blocked", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Core versus archival memory model and export/readback product concept.", + "remaining_gap": "Contained Letta export/readback remains disabled by default and blocked until core block JSON, archival readback/search JSON, and source ids exist.", + "evidence": "docs/evidence/benchmarking/2026-06-19-letta-core-archive-export-readback-report.md" + }, + { + "product": "Graphiti/Zep", + "coverage": "blocked_reference", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "not_tested", + "knowledge_workspace_pages": "not_tested", + "temporal_topic_graph_lite": "blocked", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Temporal graph validity vocabulary and graph-memory product reference.", + "remaining_gap": "Representative graph/RAG slice remains blocked by provider/graph-store setup and lacks comparable same-corpus fact artifacts.", + "evidence": "docs/evidence/benchmarking/2026-06-19-graph-rag-citation-navigation-promotion-report.md" + }, + { + "product": "GraphRAG", + "coverage": "partial_reference", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "not_tested", + "knowledge_workspace_pages": "incomplete", + "temporal_topic_graph_lite": "blocked", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Graph-oriented retrieval and citation/navigation reference direction.", + "remaining_gap": "No contained same-corpus page/fact artifact comparable to ELF knowledge and graph reports exists yet.", + "evidence": "docs/evidence/benchmarking/2026-06-19-graph-rag-citation-navigation-promotion-report.md" + }, + { + "product": "RAGFlow", + "coverage": "blocked_reference", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "not_tested", + "knowledge_workspace_pages": "blocked", + "temporal_topic_graph_lite": "blocked", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "RAG workflow and document processing product reference.", + "remaining_gap": "Representative graph/RAG comparison remains blocked until contained adapter outputs exist.", + "evidence": "docs/evidence/benchmarking/2026-06-19-graph-rag-citation-navigation-promotion-report.md" + }, + { + "product": "LightRAG", + "coverage": "incomplete_reference", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "not_tested", + "knowledge_workspace_pages": "incomplete", + "temporal_topic_graph_lite": "incomplete", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Lightweight graph/RAG architecture reference.", + "remaining_gap": "Representative comparison is incomplete, so no parity or loss claim is allowed.", + "evidence": "docs/evidence/benchmarking/2026-06-19-graph-rag-citation-navigation-promotion-report.md" + }, + { + "product": "graphify", + "coverage": "scored_smoke", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "not_tested", + "knowledge_workspace_pages": "wrong_result", + "temporal_topic_graph_lite": "wrong_result", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Tiny scored graph/RAG smoke target that can expose artifact-shape mismatches.", + "remaining_gap": "Current scored smoke is wrong_result; it is not graph/RAG parity evidence.", + "evidence": "docs/evidence/benchmarking/2026-06-19-graph-rag-citation-navigation-promotion-report.md" + }, + { + "product": "llm-wiki", + "coverage": "reference_only", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "not_tested", + "knowledge_workspace_pages": "not_tested", + "temporal_topic_graph_lite": "not_tested", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Compiled wiki and knowledge-page workflow reference.", + "remaining_gap": "No comparable same-corpus scored page artifacts are checked in.", + "evidence": "docs/evidence/benchmarking/2026-06-20-live-knowledge-page-rebuild-lint-report.md" + }, + { + "product": "gbrain", + "coverage": "reference_only", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "not_tested", + "knowledge_workspace_pages": "not_tested", + "temporal_topic_graph_lite": "not_tested", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Personal knowledge-base and query-save/lint loop reference.", + "remaining_gap": "No comparable scored adapter output is checked in.", + "evidence": "docs/evidence/benchmarking/2026-06-20-live-knowledge-page-rebuild-lint-report.md" + }, + { + "product": "LangGraph", + "coverage": "blocked_reference", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "not_tested", + "knowledge_workspace_pages": "not_tested", + "temporal_topic_graph_lite": "blocked", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Agent graph orchestration reference for stateful workflows.", + "remaining_gap": "No comparable Agent Knowledge OS memory/page/graph artifact is checked in for this matrix.", + "evidence": "docs/runbook/benchmarking/real_world_agent_memory_benchmark.md" + }, + { + "product": "nanograph", + "coverage": "blocked_reference", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "not_tested", + "knowledge_workspace_pages": "not_tested", + "temporal_topic_graph_lite": "blocked", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Typed relation and small-graph memory reference.", + "remaining_gap": "No comparable same-corpus typed fact artifact is checked in.", + "evidence": "docs/runbook/benchmarking/real_world_agent_memory_benchmark.md" + }, + { + "product": "VectifyAI PageIndex", + "coverage": "reference_only", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "not_tested", + "knowledge_workspace_pages": "not_tested", + "temporal_topic_graph_lite": "not_tested", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Vectorless, reasoning-based long-document tree retrieval and PageIndex MCP ecosystem direction.", + "remaining_gap": "No same-corpus PageIndex adapter or long-PDF tree artifact is checked into ELF yet.", + "evidence": "https://github.com/VectifyAI/PageIndex" + }, + { + "product": "VectifyAI OpenKB", + "coverage": "reference_only", + "statuses": { + "source_library_ingest_hydration": "not_tested", + "memory_authority_history_read_profiles": "not_tested", + "knowledge_workspace_pages": "not_tested", + "temporal_topic_graph_lite": "not_tested", + "dreaming_review_queue": "not_tested", + "recall_debug_panel": "not_tested" + }, + "strongest_advantage": "Document-to-wiki compilation, concept/entity pages, cross-links, lint reports, watch/recompile workflow, and PageIndex-backed long PDF handling.", + "remaining_gap": "OpenKB is a direct knowledge-library reference, but no same-corpus ELF benchmark adapter has run it yet.", + "evidence": "https://github.com/VectifyAI/OpenKB" + } + ], + "competitor_strengths": [ + { + "product": "qmd", + "strength": "Transparent local retrieval pipeline and compact replay ergonomics.", + "elf_response": "Add user-facing retrieval/fusion/rerank knobs and short replay artifacts on top of the recall debug panel.", + "evidence_class": "wrong_result" + }, + { + "product": "VectifyAI PageIndex", + "strength": "Long-document tree search without a vector database and PageIndex MCP ecosystem.", + "elf_response": "Add a long-document tree adapter path for Source Library and compare it against ELF source refs and page rebuilds.", + "evidence_class": "not_tested" + }, + { + "product": "VectifyAI OpenKB", + "strength": "Compiled Markdown wiki, concept/entity pages, saved explorations, lint, watch, and recompile workflows.", + "elf_response": "Fold OpenKB-style library management into Knowledge Workspace without weakening ELF source-of-truth boundaries.", + "evidence_class": "not_tested" + }, + { + "product": "OpenViking", + "strength": "Staged context trajectory, hierarchy selection, and recursive expansion.", + "elf_response": "Emit comparable stage artifacts from ELF recall planning and source-library hydration.", + "evidence_class": "blocked" + }, + { + "product": "mem0/OpenMemory", + "strength": "Entity-scoped memory history, hosted ecosystem, UI/export direction, and optional graph memory.", + "elf_response": "Strengthen history/event APIs, export UX, and optional graph-context channel while preserving policy-gated writes.", + "evidence_class": "blocked" + }, + { + "product": "Letta", + "strength": "Core/archive memory split and productized memory export/readback model.", + "elf_response": "Keep ELF core/archive source refs, but add contained adapter output before win/tie/loss claims.", + "evidence_class": "blocked" + }, + { + "product": "Graphiti/Zep and graph/RAG projects", + "strength": "Temporal graph validity, citation/navigation, and graph retrieval product references.", + "elf_response": "Expand graph-lite reports into adapter-backed temporal fact comparison without replacing Postgres authority.", + "evidence_class": "blocked" + }, + { + "product": "agentmemory and claude-mem", + "strength": "Capture hooks, local viewers, and practical continuity UX.", + "elf_response": "Build the operator viewer around Source Library, Memory Authority, Dreaming queue, and recall debug surfaces.", + "evidence_class": "incomplete" + } + ], + "optimization_queue": [ + { + "key": "pageindex_openkb_source_library_adapter", + "priority": "P0", + "generated_from_delta": true, + "delta": "VectifyAI PageIndex/OpenKB are reference-only but directly target long-document library management and knowledge compilation.", + "next_action": "Create a contained PageIndex/OpenKB adapter over benchmark-owned sources and compare tree/wiki artifacts against ELF source refs, knowledge pages, and recall debug rows.", + "mapping": "issue_seed" + }, + { + "key": "qmd_retrieval_knobs_and_short_replay", + "priority": "P0", + "generated_from_delta": true, + "delta": "qmd keeps the measured retrieval-debug ergonomics edge despite ELF's trace/stage visibility wins.", + "next_action": "Expose retrieval expansion, fusion, rerank, top-k, and compact replay artifacts in ELF recall/debug surfaces.", + "mapping": "docs/evidence/benchmarking/2026-06-19-qmd-debug-ergonomics-dreaming-retest-report.md" + }, + { + "key": "operator_knowledge_library_ui", + "priority": "P0", + "generated_from_delta": true, + "delta": "ELF now has source, page, graph, proposal, and debug APIs but no unified knowledge-library management surface.", + "next_action": "Build a library UI that can ingest saved articles/threads, show source docs, derived pages, graph facts, proposal queue, and replayable recall traces.", + "mapping": "XY-1022 follow-up" + }, + { + "key": "openviking_context_trajectory_artifacts", + "priority": "P1", + "generated_from_delta": true, + "delta": "OpenViking-style hierarchy and recursive trajectory remain blocked because comparable stage artifacts are missing.", + "next_action": "Emit same-corpus stage trajectory, hierarchy selection, rejected siblings, and recursive expansion artifacts.", + "mapping": "docs/evidence/benchmarking/2026-06-19-openviking-trajectory-materialization-report.md" + }, + { + "key": "letta_core_archive_export_readback", + "priority": "P1", + "generated_from_delta": true, + "delta": "Letta core/archive comparison remains blocked until export/search/readback source ids exist.", + "next_action": "Run contained Letta adapter materialization with core block JSON, archival search/readback JSON, and source ids.", + "mapping": "docs/evidence/benchmarking/2026-06-19-letta-core-archive-export-readback-report.md" + }, + { + "key": "openmemory_ui_export_and_history_parity", + "priority": "P1", + "generated_from_delta": true, + "delta": "OpenMemory UI/export remains blocked and mem0 history remains a measured reference advantage.", + "next_action": "Add product-container UI/export readback and strengthen ELF history/export APIs without weakening write policy.", + "mapping": "docs/evidence/benchmarking/2026-06-19-openmemory-ui-export-product-readback-report.md" + }, + { + "key": "graph_rag_temporal_adapter_matrix", + "priority": "P1", + "generated_from_delta": true, + "delta": "Graphiti/Zep, GraphRAG, RAGFlow, LightRAG, graphify, LangGraph, and nanograph remain typed non-pass or reference-only.", + "next_action": "Produce contained same-corpus graph fact/page/citation artifacts while keeping ELF graph-lite as readback over source authority.", + "mapping": "docs/evidence/benchmarking/2026-06-19-graph-rag-citation-navigation-promotion-report.md" + }, + { + "key": "agentmemory_claude_mem_capture_viewer", + "priority": "P2", + "generated_from_delta": true, + "delta": "agentmemory and claude-mem still point to practical capture/viewer UX that ELF has not productized.", + "next_action": "Add a local operator viewer and capture audit flow across Source Library, Memory Authority, and recall debug traces.", + "mapping": "docs/evidence/benchmarking/2026-06-11-first-generation-oss-continuity-source-store-report.md" + }, + { + "key": "private_provider_production_refresh", + "priority": "P2", + "generated_from_delta": true, + "delta": "XY-930 proxy/public-corpus evidence cannot prove real private-corpus or provider-backed quality.", + "next_action": "Run operator-approved private/provider benchmark only when routed private corpus and provider setup exist.", + "mapping": "docs/evidence/benchmarking/2026-06-19-operator-approved-public-proxy-production-private-addendum.md" + } + ], + "claim_boundaries": { + "no_broad_superiority_claim": true, + "no_private_corpus_claim": true, + "no_hosted_provider_claim": true, + "reference_only_projects_do_not_count_as_pass": true, + "not_tested_is_not_pass": true, + "blocked_is_not_pass": true + }, + "source_evidence": [ + "docs/evidence/benchmarking/2026-06-20-recall-debug-panel-report.md", + "docs/evidence/benchmarking/2026-06-20-dreaming-review-queue-report.md", + "docs/evidence/benchmarking/2026-06-20-graph-topic-map-report.md", + "docs/evidence/benchmarking/2026-06-20-knowledge-workspace-version-diff-report.md", + "docs/evidence/benchmarking/2026-06-20-live-knowledge-page-rebuild-lint-report.md", + "docs/evidence/benchmarking/2026-06-19-qmd-debug-ergonomics-dreaming-retest-report.md", + "docs/evidence/benchmarking/2026-06-19-openviking-trajectory-materialization-report.md", + "docs/evidence/benchmarking/2026-06-19-letta-core-archive-export-readback-report.md", + "docs/evidence/benchmarking/2026-06-19-openmemory-ui-export-product-readback-report.md", + "docs/evidence/benchmarking/2026-06-19-graph-rag-citation-navigation-promotion-report.md", + "https://github.com/VectifyAI/PageIndex", + "https://github.com/VectifyAI/OpenKB", + "https://github.com/VectifyAI" + ], + "validation_expectations": [ + "cargo make fmt", + "cargo make test", + "cargo make check-docs", + "decodex docs check", + "cargo test -p elf-eval --test real_world_job_benchmark", + "git diff --check" + ] +} diff --git a/apps/elf-eval/tests/real_world_job_benchmark.rs b/apps/elf-eval/tests/real_world_job_benchmark.rs index e17c6bc3..d57f838d 100644 --- a/apps/elf-eval/tests/real_world_job_benchmark.rs +++ b/apps/elf-eval/tests/real_world_job_benchmark.rs @@ -258,6 +258,10 @@ fn recall_debug_panel_report_json_path() -> Result { report_snapshot_path("2026-06-20-recall-debug-panel-report.json") } +fn agent_knowledge_os_closeout_benchmark_report_json_path() -> Result { + report_snapshot_path("2026-06-20-agent-knowledge-os-closeout-benchmark-report.json") +} + fn openmemory_ui_export_product_readback_report_json_path() -> Result { report_snapshot_path("2026-06-19-openmemory-ui-export-product-readback-report.json") } @@ -312,6 +316,14 @@ fn recall_debug_panel_report_markdown_path() -> Result { .join("2026-06-20-recall-debug-panel-report.md")) } +fn agent_knowledge_os_closeout_benchmark_report_markdown_path() -> Result { + Ok(workspace_root()? + .join("docs") + .join("evidence") + .join("benchmarking") + .join("2026-06-20-agent-knowledge-os-closeout-benchmark-report.md")) +} + fn openmemory_ui_export_product_readback_report_markdown_path() -> Result { Ok(workspace_root()? .join("docs") @@ -3809,6 +3821,165 @@ fn recall_debug_panel_report_wires_cross_layer_debug_contract() -> Result<()> { Ok(()) } +#[test] +fn agent_knowledge_os_closeout_benchmark_preserves_full_matrix_boundaries() -> Result<()> { + let report = serde_json::from_str::(&fs::read_to_string( + agent_knowledge_os_closeout_benchmark_report_json_path()?, + )?)?; + + assert_eq!( + report.pointer("/schema").and_then(Value::as_str), + Some("elf.agent_knowledge_os_closeout_benchmark_report/v1") + ); + assert_eq!(report.pointer("/authority").and_then(Value::as_str), Some("XY-1023")); + assert_eq!( + report.pointer("/summary/strongest_measured_integrated_product").and_then(Value::as_str), + Some("ELF integrated Agent Knowledge OS") + ); + assert_eq!( + report.pointer("/all_project_fixture_rerun/status").and_then(Value::as_str), + Some("pass") + ); + assert_eq!( + report.pointer("/all_project_fixture_rerun/job_count").and_then(Value::as_u64), + Some(62) + ); + assert_eq!(report.pointer("/all_project_fixture_rerun/pass").and_then(Value::as_u64), Some(55)); + assert_eq!(report.pointer("/summary/product_count").and_then(Value::as_u64), Some(19)); + assert_eq!(report.pointer("/summary/scenario_count").and_then(Value::as_u64), Some(6)); + assert_eq!( + report + .pointer("/summary/not_every_product_has_complete_live_coverage") + .and_then(Value::as_bool), + Some(true) + ); + assert_eq!( + report.pointer("/summary/evidence_class_counts/pass").and_then(Value::as_u64), + Some(9) + ); + assert_eq!( + report.pointer("/summary/evidence_class_counts/not_tested").and_then(Value::as_u64), + Some(78) + ); + + let scenarios = array_at(&report, "/supported_scenarios")?; + let matrix = array_at(&report, "/product_matrix")?; + + for scenario in [ + "source_library_ingest_hydration", + "memory_authority_history_read_profiles", + "knowledge_workspace_pages", + "temporal_topic_graph_lite", + "dreaming_review_queue", + "recall_debug_panel", + ] { + find_by_field(scenarios, "/id", scenario)?; + } + + let elf = find_by_field(matrix, "/product", "ELF")?; + + for scenario in [ + "source_library_ingest_hydration", + "memory_authority_history_read_profiles", + "knowledge_workspace_pages", + "temporal_topic_graph_lite", + "dreaming_review_queue", + "recall_debug_panel", + ] { + assert_eq!( + elf.pointer(&format!("/statuses/{scenario}")).and_then(Value::as_str), + Some("pass") + ); + } + + let qmd = find_by_field(matrix, "/product", "qmd")?; + + assert_eq!( + qmd.pointer("/statuses/recall_debug_panel").and_then(Value::as_str), + Some("wrong_result") + ); + assert!( + qmd.pointer("/strongest_advantage") + .and_then(Value::as_str) + .is_some_and(|value| value.contains("weighted fusion")) + ); + + for product in ["VectifyAI PageIndex", "VectifyAI OpenKB"] { + let row = find_by_field(matrix, "/product", product)?; + + assert_eq!(row.pointer("/coverage").and_then(Value::as_str), Some("reference_only")); + assert_eq!( + row.pointer("/statuses/knowledge_workspace_pages").and_then(Value::as_str), + Some("not_tested") + ); + } + + assert_eq!( + report.pointer("/claim_boundaries/no_broad_superiority_claim").and_then(Value::as_bool), + Some(true) + ); + assert_eq!( + report + .pointer("/claim_boundaries/reference_only_projects_do_not_count_as_pass") + .and_then(Value::as_bool), + Some(true) + ); + assert!(array_contains_str( + &report, + "/source_evidence", + "https://github.com/VectifyAI/PageIndex" + )?); + assert!(array_contains_str( + &report, + "/source_evidence", + "https://github.com/VectifyAI/OpenKB" + )?); + + Ok(()) +} + +#[test] +fn agent_knowledge_os_closeout_benchmark_wires_docs_and_optimization_queue() -> Result<()> { + let report = serde_json::from_str::(&fs::read_to_string( + agent_knowledge_os_closeout_benchmark_report_json_path()?, + )?)?; + let markdown = + fs::read_to_string(agent_knowledge_os_closeout_benchmark_report_markdown_path()?)?; + let benchmarking_index = fs::read_to_string(benchmarking_index_path()?)?; + let readme = fs::read_to_string(readme_path()?)?; + let queue = array_at(&report, "/optimization_queue")?; + + for item in queue { + assert_eq!(item.pointer("/generated_from_delta").and_then(Value::as_bool), Some(true)); + } + for key in [ + "pageindex_openkb_source_library_adapter", + "qmd_retrieval_knobs_and_short_replay", + "operator_knowledge_library_ui", + "openviking_context_trajectory_artifacts", + "graph_rag_temporal_adapter_matrix", + ] { + let item = find_by_field(queue, "/key", key)?; + + assert_eq!(item.pointer("/generated_from_delta").and_then(Value::as_bool), Some(true)); + } + + assert!(markdown.contains("ELF is the strongest measured integrated product")); + assert!(markdown.contains("complete live coverage")); + assert!(markdown.contains("VectifyAI PageIndex")); + assert!(markdown.contains("VectifyAI OpenKB")); + assert!(markdown.contains("Do not claim ELF broadly beats every competitor")); + assert!( + benchmarking_index.contains("2026-06-20-agent-knowledge-os-closeout-benchmark-report.md") + ); + assert!(readme.contains("Agent Knowledge OS closeout after XY-1023")); + assert!(readme.contains("62 jobs, 55 pass")); + assert!(readme.contains("VectifyAI PageIndex/OpenKB")); + assert!(readme.contains("strongest measured integrated")); + + Ok(()) +} + #[test] fn operator_approved_public_proxy_private_addendum_preserves_boundary() -> Result<()> { let report = serde_json::from_str::(&fs::read_to_string( diff --git a/docs/evidence/benchmarking/2026-06-20-agent-knowledge-os-closeout-benchmark-report.md b/docs/evidence/benchmarking/2026-06-20-agent-knowledge-os-closeout-benchmark-report.md new file mode 100644 index 00000000..68136d15 --- /dev/null +++ b/docs/evidence/benchmarking/2026-06-20-agent-knowledge-os-closeout-benchmark-report.md @@ -0,0 +1,148 @@ +--- +type: Evidence +title: "Agent Knowledge OS Closeout Benchmark Report - June 20, 2026" +description: "Checked-in closeout evidence matrix for the Agent Knowledge OS program." +resource: docs/evidence/benchmarking/2026-06-20-agent-knowledge-os-closeout-benchmark-report.md +status: active +authority: current_state +owner: evidence +last_verified: 2026-06-20 +tags: + - docs + - evidence + - benchmarking + - agent-knowledge-os +--- +# Agent Knowledge OS Closeout Benchmark Report - June 20, 2026 + +Goal: Close XY-1023 by publishing a table-driven self-assessment after the staged +Agent Knowledge OS lanes landed. + +Inputs: `apps/elf-eval/fixtures/report_snapshots/2026-06-20-agent-knowledge-os-closeout-benchmark-report.json`, +the June 20 component reports, the June 19 competitor retests/materialization +reports, and the current VectifyAI PageIndex/OpenKB GitHub readback. + +## Command Evidence + +| Command | Status | Result | +| --- | --- | --- | +| `cargo make real-world-memory` | pass | Reran the checked-in all-project fixture suite: 62 jobs, 17 encoded suites, 55 pass, 0 wrong_result, 0 incomplete, 7 blocked, 1.000 evidence/source-ref/quote coverage, mean score 0.887. | +| `cargo test -p elf-eval --test real_world_job_benchmark agent_knowledge_os_closeout_benchmark -- --nocapture` | pass | Guards the XY-1023 summary counts, key matrix boundaries, VectifyAI reference-only rows, claim boundaries, README/index links, and optimization queue. | + +## Executive Judgment + +ELF is the strongest measured integrated product in the current checked-in Agent +Knowledge OS matrix. It is the only product with same-repo evidence across all six +layers: Source Library, Memory Authority, Knowledge Workspace, graph-lite facts, +Dreaming review queue, and recall/debug panel. + +That is not a broad "ELF beats everyone everywhere" claim. Not every product has +complete live coverage. qmd remains the retrieval/debug ergonomics reference; +OpenViking remains the context-trajectory reference; mem0/OpenMemory remains the +entity-history and ecosystem reference; Letta remains the core/archive memory +reference; Graphiti/Zep and graph/RAG projects remain graph-memory references; and +VectifyAI PageIndex/OpenKB are now explicit reference-only competitors for long +document tree retrieval and knowledge-base compilation. + +## Coverage Summary + +| Metric | Value | +| --- | --- | +| Products/projects in matrix | 19 | +| Agent Knowledge OS scenarios | 6 | +| Complete same-repo product coverage | ELF only | +| Matrix pass cells | 9 | +| Matrix wrong_result cells | 7 | +| Matrix incomplete cells | 6 | +| Matrix blocked cells | 14 | +| Matrix not_tested cells | 78 | + +Evidence classes keep their normal meaning: `pass` means checked-in evidence +supports the claim; `wrong_result` means the adapter ran but missed required +evidence; `incomplete` means partial behavior or artifact coverage; `blocked` means +the required setup/artifact is missing; `not_tested` means reference-only or no +same-corpus benchmark coverage. + +## Product Scenario Matrix + +| Product/project | Coverage | Source Library | Memory Authority | Knowledge Workspace | Graph-lite/Temporal | Dreaming Review | Recall Debug | Current strongest advantage | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | +| ELF | complete_same_repo | pass | pass | pass | pass | pass | pass | Policy-gated, source-linked, replayable authority across all six layers. | +| qmd | partial_same_corpus | not_tested | wrong_result | wrong_result | not_tested | wrong_result | wrong_result | Query expansion, weighted fusion, rerank, compact replay, and local debug knobs. | +| agentmemory | partial_same_corpus | incomplete | incomplete | not_tested | not_tested | blocked | not_tested | Cross-agent hooks, MCP/REST packaging, local viewer, continuity workflow. | +| OpenViking | partial_same_corpus | wrong_result | blocked | not_tested | not_tested | not_tested | not_tested | Filesystem-like context URIs, hierarchy, staged retrieval trajectory. | +| mem0/OpenMemory | partial_same_corpus | not_tested | pass | not_tested | blocked | not_tested | blocked | Entity-scoped memory history, hosted ecosystem, OpenMemory UI/export direction. | +| claude-mem | partial_reference | blocked | incomplete | not_tested | not_tested | blocked | not_tested | Progressive disclosure UX, local viewer, automatic capture-loop reference. | +| memsearch | partial_same_corpus | pass | pass | not_tested | not_tested | not_tested | not_tested | Markdown-first canonical store, incremental reindex, hybrid retrieval. | +| Letta | blocked_materialization | not_tested | blocked | not_tested | not_tested | blocked | not_tested | Core/archive memory model and export/readback product concept. | +| Graphiti/Zep | blocked_reference | not_tested | not_tested | not_tested | blocked | not_tested | not_tested | Temporal graph validity vocabulary and graph-memory product reference. | +| GraphRAG | partial_reference | not_tested | not_tested | incomplete | blocked | not_tested | not_tested | Graph-oriented retrieval and citation/navigation reference direction. | +| RAGFlow | blocked_reference | not_tested | not_tested | blocked | blocked | not_tested | not_tested | RAG workflow and document processing product reference. | +| LightRAG | incomplete_reference | not_tested | not_tested | incomplete | incomplete | not_tested | not_tested | Lightweight graph/RAG architecture reference. | +| graphify | scored_smoke | not_tested | not_tested | wrong_result | wrong_result | not_tested | not_tested | Tiny scored graph/RAG smoke target for artifact-shape mismatches. | +| llm-wiki | reference_only | not_tested | not_tested | not_tested | not_tested | not_tested | not_tested | Compiled wiki and knowledge-page workflow reference. | +| gbrain | reference_only | not_tested | not_tested | not_tested | not_tested | not_tested | not_tested | Personal knowledge-base and query-save/lint loop reference. | +| LangGraph | blocked_reference | not_tested | not_tested | not_tested | blocked | not_tested | not_tested | Agent graph orchestration reference for stateful workflows. | +| nanograph | blocked_reference | not_tested | not_tested | not_tested | blocked | not_tested | not_tested | Typed relation and small-graph memory reference. | +| VectifyAI PageIndex | reference_only | not_tested | not_tested | not_tested | not_tested | not_tested | not_tested | Vectorless long-document tree retrieval and PageIndex MCP ecosystem direction. | +| VectifyAI OpenKB | reference_only | not_tested | not_tested | not_tested | not_tested | not_tested | not_tested | Document-to-wiki compilation, concept/entity pages, lint, watch, and recompile workflow. | + +## Competitor Strengths To Preserve + +| Competitor | Strength | Evidence class | ELF response | +| --- | --- | --- | --- | +| qmd | Transparent local retrieval pipeline and compact replay ergonomics. | wrong_result | Add retrieval expansion, fusion, rerank, top-k, and compact replay controls on top of recall/debug. | +| VectifyAI PageIndex | Long-document tree search without a vector database and PageIndex MCP ecosystem. | not_tested | Add a benchmark-owned long-document tree adapter and compare it with ELF source refs and page rebuilds. | +| VectifyAI OpenKB | Compiled Markdown wiki, concept/entity pages, saved explorations, lint, watch, and recompile workflows. | not_tested | Fold OpenKB-style library management into Knowledge Workspace without weakening source-of-truth boundaries. | +| OpenViking | Staged context trajectory, hierarchy selection, and recursive expansion. | blocked | Emit comparable recall-planning stage artifacts, rejected siblings, and recursive expansion evidence. | +| mem0/OpenMemory | Entity-scoped memory history, hosted ecosystem, UI/export, and optional graph memory direction. | blocked | Strengthen history/event APIs, export UX, and optional graph-context channel while preserving policy-gated writes. | +| Letta | Core/archive memory split and memory export/readback product model. | blocked | Keep ELF core/archive source refs, but add contained adapter output before win/tie/loss claims. | +| Graphiti/Zep and graph/RAG projects | Temporal graph validity, citation/navigation, and graph retrieval references. | blocked | Expand graph-lite reports into adapter-backed temporal fact comparison without replacing Postgres authority. | +| agentmemory and claude-mem | Capture hooks, local viewers, and practical continuity UX. | incomplete | Build the operator viewer around Source Library, Memory Authority, Dreaming queue, and recall debug surfaces. | + +## ELF Advantages + +ELF's current durable advantage is composition: the layers are independently typed, +source-linked, and replayable, but they now fit together as an Agent Knowledge OS. +Source Library records preserve captured material, Memory Authority controls what +becomes memory, Knowledge Workspace pages stay derived and linted, graph-lite facts +stay source-backed, Dreaming proposals stay reviewable, and the recall/debug panel +shows selected, dropped, available, reviewable, not_requested, and blocked context. + +This combination is stronger than any single measured competitor in the current +matrix. It is also more conservative: ELF refuses to call reference-only strengths a +pass, and it keeps private/provider production proof separate from simulated or +public-proxy evidence. + +## Optimization Queue + +| Priority | Queue item | Generated from benchmark delta | Next action | +| --- | --- | --- | --- | +| P0 | `pageindex_openkb_source_library_adapter` | PageIndex/OpenKB are reference-only but directly target long-document library management and knowledge compilation. | Create a contained adapter over benchmark-owned sources and compare tree/wiki artifacts against ELF source refs, knowledge pages, and recall debug rows. | +| P0 | `qmd_retrieval_knobs_and_short_replay` | qmd keeps the measured retrieval-debug ergonomics edge. | Expose retrieval expansion, fusion, rerank, top-k, and compact replay artifacts in ELF recall/debug surfaces. | +| P0 | `operator_knowledge_library_ui` | ELF has APIs but no unified library management surface. | Build a UI for saved articles/threads, source docs, derived pages, graph facts, proposal queue, and replayable recall traces. | +| P1 | `openviking_context_trajectory_artifacts` | OpenViking trajectory/hierarchy/recursive expansion remains blocked. | Emit same-corpus stage trajectory, hierarchy selection, rejected siblings, and recursive expansion artifacts. | +| P1 | `letta_core_archive_export_readback` | Letta core/archive comparison remains blocked. | Run contained Letta export/readback with core block JSON, archival search/readback JSON, and source ids. | +| P1 | `openmemory_ui_export_and_history_parity` | OpenMemory UI/export remains blocked and mem0 history remains a reference advantage. | Add product-container UI/export readback and strengthen ELF history/export APIs. | +| P1 | `graph_rag_temporal_adapter_matrix` | Graph/RAG projects remain typed non-pass or reference-only. | Produce contained same-corpus graph fact/page/citation artifacts while keeping ELF graph-lite source-backed. | +| P2 | `agentmemory_claude_mem_capture_viewer` | Capture/viewer UX remains incomplete or blocked. | Add a local operator viewer and capture audit flow across Source Library, Memory Authority, and recall debug traces. | +| P2 | `private_provider_production_refresh` | XY-930 proxy/public-corpus evidence cannot prove real private-corpus or provider-backed quality. | Run only when routed private corpus and provider setup exist. | + +## Claim Boundaries + +Allowed: + +- ELF is the strongest measured integrated Agent Knowledge OS product in this + checked-in matrix. +- ELF has complete same-repo evidence across the six Agent Knowledge OS layers. +- qmd, PageIndex/OpenKB, OpenViking, mem0/OpenMemory, Letta, graph/RAG systems, and + capture/viewer projects still provide important optimization direction. + +Not allowed: + +- Do not claim ELF broadly beats every competitor on every competitor-owned strength. +- Do not treat `not_tested`, `blocked`, `incomplete`, or `wrong_result` as pass. +- Do not count VectifyAI PageIndex or OpenKB as benchmark wins until a same-corpus + adapter emits checked-in artifacts. +- Do not claim private-corpus or hosted-provider production quality from public-proxy + or local fixture evidence. diff --git a/docs/evidence/benchmarking/index.md b/docs/evidence/benchmarking/index.md index 6500a2e9..4888ca21 100644 --- a/docs/evidence/benchmarking/index.md +++ b/docs/evidence/benchmarking/index.md @@ -48,3 +48,4 @@ Routes to: Benchmarking evidence concepts under `docs/evidence/benchmarking/`. - `2026-06-20-knowledge-workspace-version-diff-report.md`: Knowledge Workspace Version-Diff Report - June 20, 2026; proves ELF knowledge pages now expose previous-version diff metadata without perturbing page content hashes while preserving citation, lint, and source-of-truth boundaries. - `2026-06-20-live-knowledge-page-rebuild-lint-report.md`: Live Knowledge-Page Rebuild/Lint Report - June 20, 2026; adds a Docker-contained ELF service-native knowledge-page materialization command while preserving llm-wiki, gbrain, GraphRAG, RAGFlow, LightRAG, and graphify as separate comparison targets until they emit comparable scored page artifacts. - `2026-06-20-recall-debug-panel-report.md`: Recall Debug Panel Report - June 20, 2026; adds `elf.recall_debug_panel/v1` as a typed cross-layer readback over memory traces, Source Library document candidates, Knowledge Workspace pages, graph facts, and Dreaming proposals while preserving not-requested and non-pass evidence classes. +- `2026-06-20-agent-knowledge-os-closeout-benchmark-report.md`: Agent Knowledge OS Closeout Benchmark Report - June 20, 2026; publishes the XY-1023 full product/scenario matrix, names ELF as the strongest measured integrated product, preserves qmd/OpenViking/mem0/OpenMemory/Letta/graph-RAG/VectifyAI strengths, and turns material non-pass or reference-only deltas into optimization queue items.