Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,11 @@ provider-backed ELF evidence was required.
graph/RAG strengths. Graph/RAG citation/navigation promotion after XY-985 refreshes
this state as 0 pass, 1 wrong_result, 1 incomplete, and 3 blocked, with graphify
evidence-linked output still scoring wrong_result.
- RAGFlow/GraphRAG/LightRAG adapter matrix after XY-1071: the June 23 matrix adds
manifest-backed rows for retrieval quality, citation quality, navigation quality,
stale-source behavior, answer faithfulness, and knowledge compilation quality. It
records 0 pass rows, preserves blocked/incomplete/not-encoded typed states, and
does not make a graph/RAG parity or generic RAG-platform claim.
- mem0/OpenMemory history follow-up after XY-924 and XY-931: the local OSS mem0
adapter now passes encoded preference correction history, entity-scoped
personalization, local `get_all` export-style readback, and deletion audit history.
Expand Down Expand Up @@ -435,6 +440,7 @@ Detailed evidence and interpretation:
- [PageIndex/OpenKB Same-Corpus Adapter Report - June 22, 2026](docs/evidence/benchmarking/2026-06-22-pageindex-openkb-same-corpus-adapter-report.md)
- [mem0/OpenMemory and Letta Memory-History/Core-Archive Adapter Report - June 22, 2026](docs/evidence/benchmarking/2026-06-22-mem0-openmemory-letta-memory-history-core-archive-report.md)
- [Temporal and Trajectory Adapter Coverage Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-temporal-trajectory-adapter-coverage-report.md)
- [Graph/RAG Adapter Matrix Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md)
- [Live Baseline Benchmark Runbook](docs/runbook/benchmarking/live_baseline_benchmark.md)
- [Single-User Production Runbook](docs/runbook/single_user_production.md)
- Benchmark contract:
Expand Down Expand Up @@ -529,6 +535,7 @@ Detailed comparison, mechanism-level analysis, and source map:
- [PageIndex/OpenKB Same-Corpus Adapter Report - June 22, 2026](docs/evidence/benchmarking/2026-06-22-pageindex-openkb-same-corpus-adapter-report.md)
- [mem0/OpenMemory and Letta Memory-History/Core-Archive Adapter Report - June 22, 2026](docs/evidence/benchmarking/2026-06-22-mem0-openmemory-letta-memory-history-core-archive-report.md)
- [Temporal and Trajectory Adapter Coverage Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-temporal-trajectory-adapter-coverage-report.md)
- [Graph/RAG Adapter Matrix Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md)
- [Live Baseline Benchmark Runbook](docs/runbook/benchmarking/live_baseline_benchmark.md)
- [Real-World Agent Memory Benchmark](docs/runbook/benchmarking/real_world_agent_memory_benchmark.md)
- [External Memory Improvement Plan](docs/evidence/external_memory/external_memory_improvement_plan.md)
Expand All @@ -547,11 +554,12 @@ Report - June 20, 2026, and the Live Knowledge-Page Rebuild/Lint Report - June 2
2026; June 22 adds the P1 Memory Authority Closeout Report, P2 Knowledge
Workspace PageIndex/OpenKB Closeout Report, PageIndex/OpenKB Same-Corpus Adapter
Report, and mem0/OpenMemory and Letta Memory-History/Core-Archive Adapter Report;
June 23 adds the Temporal and Trajectory Adapter Coverage Report after the June 19
XY-930 operator-approved public-proxy production addendum and service-native Dreaming
readback, the qmd debug-ergonomics Dreaming retest, the June 17 competitor-strength
closeout, and the June 16 temporal reconciliation, live consolidation self-check,
proactive-brief, and scheduled-memory scoring evidence.
June 23 adds the Temporal and Trajectory Adapter Coverage Report and the Graph/RAG
Adapter Matrix Report after the June 19 XY-930 operator-approved public-proxy
production addendum and service-native Dreaming readback, the qmd debug-ergonomics
Dreaming retest, the June 17 competitor-strength closeout, and the June 16 temporal
reconciliation, live consolidation self-check, proactive-brief, and scheduled-memory
scoring evidence.

## Documentation

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1823,6 +1823,54 @@
"command": "cargo make real-world-memory-graph-rag",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ragflow_reference_chunks_blocked.json"
},
{
"scenario_id": "retrieval_quality_reference_recall",
"suite_id": "retrieval",
"status": "blocked",
"elf_position": "untested",
"comparison_outcome": "blocked",
"evidence": "XY-1071 keeps RAGFlow retrieval quality blocked until the same generated corpus returns answer text and selected reference chunks whose document ids, chunk ids, content, and metadata map to expected evidence ids; setup or API reachability alone is not retrieval quality evidence.",
"command": "cargo make real-world-memory-graph-rag",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ragflow_reference_chunks_blocked.json"
},
{
"scenario_id": "navigation_quality_document_chunks",
"suite_id": "retrieval",
"status": "blocked",
"elf_position": "untested",
"comparison_outcome": "blocked",
"evidence": "RAGFlow document/chunk navigation remains blocked until returned references expose stable document metadata plus chunk identifiers that can be followed back to same-corpus source evidence.",
"command": "cargo make real-world-memory-graph-rag",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ragflow_reference_chunks_blocked.json"
},
{
"scenario_id": "answer_faithfulness_reference_chunks",
"suite_id": "retrieval",
"status": "blocked",
"elf_position": "untested",
"comparison_outcome": "blocked",
"evidence": "RAGFlow answer faithfulness is blocked until generated answers can be checked against returned reference chunk content and decoy/stale chunks are absent from cited support.",
"command": "cargo make real-world-memory-graph-rag",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ragflow_reference_chunks_blocked.json"
},
{
"scenario_id": "stale_source_behavior",
"suite_id": "retrieval",
"status": "not_encoded",
"elf_position": "untested",
"comparison_outcome": "not_tested",
"evidence": "RAGFlow stale-source replacement, invalidation, or lint behavior is not encoded by the current same-corpus reference-chunk blocker; no stale-source quality claim is made.",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json"
},
{
"scenario_id": "knowledge_compilation_quality",
"suite_id": "knowledge_compilation",
"status": "not_encoded",
"elf_position": "untested",
"comparison_outcome": "not_tested",
"evidence": "RAGFlow knowledge compilation quality is not scored because no checked-in same-corpus RAGFlow page, section, citation, or stale-source lint artifact exists.",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json"
},
{
"scenario_id": "private_or_large_corpus_ragflow_quality",
"suite_id": "retrieval",
Expand Down Expand Up @@ -1967,6 +2015,64 @@
"command": "cargo make real-world-memory-graph-rag",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json"
},
{
"scenario_id": "retrieval_quality_context_recall",
"suite_id": "retrieval",
"status": "incomplete",
"elf_position": "untested",
"comparison_outcome": "blocked",
"evidence": "XY-1071 keeps LightRAG retrieval quality incomplete until the opt-in Docker API exports same-corpus context or references that can be joined to expected evidence ids; service startup alone is not a retrieval-quality result.",
"command": "cargo make real-world-memory-graph-rag",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json"
},
{
"scenario_id": "citation_quality_context_references",
"suite_id": "retrieval",
"status": "incomplete",
"elf_position": "untested",
"comparison_outcome": "blocked",
"evidence": "LightRAG citation quality is incomplete until returned context, references.file_path, references.content, or equivalent source snippets map to generated evidence ids.",
"command": "cargo make real-world-memory-graph-rag",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json"
},
{
"scenario_id": "navigation_quality_graph_context",
"suite_id": "retrieval",
"status": "incomplete",
"elf_position": "untested",
"comparison_outcome": "blocked",
"evidence": "LightRAG graph/context navigation remains incomplete until exported context exposes source paths or graph-derived source snippets that can be followed back to same-corpus evidence.",
"command": "cargo make real-world-memory-graph-rag",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json"
},
{
"scenario_id": "answer_faithfulness_context_refs",
"suite_id": "retrieval",
"status": "incomplete",
"elf_position": "untested",
"comparison_outcome": "blocked",
"evidence": "LightRAG answer faithfulness is incomplete until generated answers and only_need_context output can be checked for required evidence, decoy exclusion, and source-reference alignment.",
"command": "cargo make real-world-memory-graph-rag",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json"
},
{
"scenario_id": "stale_source_behavior",
"suite_id": "retrieval",
"status": "not_encoded",
"elf_position": "untested",
"comparison_outcome": "not_tested",
"evidence": "LightRAG stale-source replacement, invalidation, or lint behavior is not encoded by the current context-source blocker; no stale-source quality claim is made.",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json"
},
{
"scenario_id": "knowledge_compilation_quality",
"suite_id": "knowledge_compilation",
"status": "not_encoded",
"elf_position": "untested",
"comparison_outcome": "not_tested",
"evidence": "LightRAG knowledge compilation quality is not scored because no checked-in same-corpus page, section, citation, or stale-source lint artifact exists.",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json"
},
{
"scenario_id": "graph_rag_navigation_quality",
"suite_id": "retrieval",
Expand Down Expand Up @@ -2126,6 +2232,44 @@
"command": "cargo make real-world-memory-graph-rag",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/graphrag_output_tables_blocked.json"
},
{
"scenario_id": "retrieval_quality_local_search",
"suite_id": "retrieval",
"status": "not_encoded",
"elf_position": "untested",
"comparison_outcome": "not_tested",
"evidence": "XY-1071 keeps GraphRAG retrieval quality not tested because the current smoke records output-table and local-search reachability contracts but does not score same-corpus retrieval answers beyond mapped output prerequisites.",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json"
},
{
"scenario_id": "navigation_quality_community_graph",
"suite_id": "knowledge_compilation",
"status": "blocked",
"elf_position": "untested",
"comparison_outcome": "blocked",
"evidence": "GraphRAG community/entity/relationship navigation remains blocked until provider-backed output tables expose community, entity, relationship, text-unit, and document identifiers that map to generated evidence ids.",
"command": "cargo make real-world-memory-graph-rag",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/graphrag_output_tables_blocked.json"
},
{
"scenario_id": "answer_faithfulness_output_tables",
"suite_id": "knowledge_compilation",
"status": "blocked",
"elf_position": "untested",
"comparison_outcome": "blocked",
"evidence": "GraphRAG answer faithfulness is blocked until summaries or local-search answers can be checked against mapped documents, text units, and community report rows while excluding unsupported or stale claims.",
"command": "cargo make real-world-memory-graph-rag",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/graphrag_output_tables_blocked.json"
},
{
"scenario_id": "stale_source_behavior",
"suite_id": "knowledge_compilation",
"status": "not_encoded",
"elf_position": "untested",
"comparison_outcome": "not_tested",
"evidence": "GraphRAG stale-source replacement, invalidation, or lint behavior is not encoded by the current output-table blocker; no stale-source quality claim is made.",
"artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json"
},
{
"scenario_id": "graph_summary_synthesis_quality",
"suite_id": "knowledge_compilation",
Expand Down
Loading