diff --git a/README.md b/README.md index 39fd23b..954fb64 100644 --- a/README.md +++ b/README.md @@ -369,6 +369,11 @@ provider-backed ELF evidence was required. graph/RAG strengths. Graph/RAG citation/navigation promotion after XY-985 refreshes this state as 0 pass, 1 wrong_result, 1 incomplete, and 3 blocked, with graphify evidence-linked output still scoring wrong_result. +- RAGFlow/GraphRAG/LightRAG adapter matrix after XY-1071: the June 23 matrix adds + manifest-backed rows for retrieval quality, citation quality, navigation quality, + stale-source behavior, answer faithfulness, and knowledge compilation quality. It + records 0 pass rows, preserves blocked/incomplete/not-encoded typed states, and + does not make a graph/RAG parity or generic RAG-platform claim. - mem0/OpenMemory history follow-up after XY-924 and XY-931: the local OSS mem0 adapter now passes encoded preference correction history, entity-scoped personalization, local `get_all` export-style readback, and deletion audit history. @@ -435,6 +440,7 @@ Detailed evidence and interpretation: - [PageIndex/OpenKB Same-Corpus Adapter Report - June 22, 2026](docs/evidence/benchmarking/2026-06-22-pageindex-openkb-same-corpus-adapter-report.md) - [mem0/OpenMemory and Letta Memory-History/Core-Archive Adapter Report - June 22, 2026](docs/evidence/benchmarking/2026-06-22-mem0-openmemory-letta-memory-history-core-archive-report.md) - [Temporal and Trajectory Adapter Coverage Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-temporal-trajectory-adapter-coverage-report.md) +- [Graph/RAG Adapter Matrix Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md) - [Live Baseline Benchmark Runbook](docs/runbook/benchmarking/live_baseline_benchmark.md) - [Single-User Production Runbook](docs/runbook/single_user_production.md) - Benchmark contract: @@ -529,6 +535,7 @@ Detailed comparison, mechanism-level analysis, and source map: - [PageIndex/OpenKB Same-Corpus Adapter Report - June 22, 2026](docs/evidence/benchmarking/2026-06-22-pageindex-openkb-same-corpus-adapter-report.md) - [mem0/OpenMemory and Letta Memory-History/Core-Archive Adapter Report - June 22, 2026](docs/evidence/benchmarking/2026-06-22-mem0-openmemory-letta-memory-history-core-archive-report.md) - [Temporal and Trajectory Adapter Coverage Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-temporal-trajectory-adapter-coverage-report.md) +- [Graph/RAG Adapter Matrix Report - June 23, 2026](docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md) - [Live Baseline Benchmark Runbook](docs/runbook/benchmarking/live_baseline_benchmark.md) - [Real-World Agent Memory Benchmark](docs/runbook/benchmarking/real_world_agent_memory_benchmark.md) - [External Memory Improvement Plan](docs/evidence/external_memory/external_memory_improvement_plan.md) @@ -547,11 +554,12 @@ Report - June 20, 2026, and the Live Knowledge-Page Rebuild/Lint Report - June 2 2026; June 22 adds the P1 Memory Authority Closeout Report, P2 Knowledge Workspace PageIndex/OpenKB Closeout Report, PageIndex/OpenKB Same-Corpus Adapter Report, and mem0/OpenMemory and Letta Memory-History/Core-Archive Adapter Report; -June 23 adds the Temporal and Trajectory Adapter Coverage Report after the June 19 -XY-930 operator-approved public-proxy production addendum and service-native Dreaming -readback, the qmd debug-ergonomics Dreaming retest, the June 17 competitor-strength -closeout, and the June 16 temporal reconciliation, live consolidation self-check, -proactive-brief, and scheduled-memory scoring evidence. +June 23 adds the Temporal and Trajectory Adapter Coverage Report and the Graph/RAG +Adapter Matrix Report after the June 19 XY-930 operator-approved public-proxy +production addendum and service-native Dreaming readback, the qmd debug-ergonomics +Dreaming retest, the June 17 competitor-strength closeout, and the June 16 temporal +reconciliation, live consolidation self-check, proactive-brief, and scheduled-memory +scoring evidence. ## Documentation diff --git a/apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json b/apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json index 109cb8d..578fe7f 100644 --- a/apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json +++ b/apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json @@ -1823,6 +1823,54 @@ "command": "cargo make real-world-memory-graph-rag", "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ragflow_reference_chunks_blocked.json" }, + { + "scenario_id": "retrieval_quality_reference_recall", + "suite_id": "retrieval", + "status": "blocked", + "elf_position": "untested", + "comparison_outcome": "blocked", + "evidence": "XY-1071 keeps RAGFlow retrieval quality blocked until the same generated corpus returns answer text and selected reference chunks whose document ids, chunk ids, content, and metadata map to expected evidence ids; setup or API reachability alone is not retrieval quality evidence.", + "command": "cargo make real-world-memory-graph-rag", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ragflow_reference_chunks_blocked.json" + }, + { + "scenario_id": "navigation_quality_document_chunks", + "suite_id": "retrieval", + "status": "blocked", + "elf_position": "untested", + "comparison_outcome": "blocked", + "evidence": "RAGFlow document/chunk navigation remains blocked until returned references expose stable document metadata plus chunk identifiers that can be followed back to same-corpus source evidence.", + "command": "cargo make real-world-memory-graph-rag", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ragflow_reference_chunks_blocked.json" + }, + { + "scenario_id": "answer_faithfulness_reference_chunks", + "suite_id": "retrieval", + "status": "blocked", + "elf_position": "untested", + "comparison_outcome": "blocked", + "evidence": "RAGFlow answer faithfulness is blocked until generated answers can be checked against returned reference chunk content and decoy/stale chunks are absent from cited support.", + "command": "cargo make real-world-memory-graph-rag", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ragflow_reference_chunks_blocked.json" + }, + { + "scenario_id": "stale_source_behavior", + "suite_id": "retrieval", + "status": "not_encoded", + "elf_position": "untested", + "comparison_outcome": "not_tested", + "evidence": "RAGFlow stale-source replacement, invalidation, or lint behavior is not encoded by the current same-corpus reference-chunk blocker; no stale-source quality claim is made.", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json" + }, + { + "scenario_id": "knowledge_compilation_quality", + "suite_id": "knowledge_compilation", + "status": "not_encoded", + "elf_position": "untested", + "comparison_outcome": "not_tested", + "evidence": "RAGFlow knowledge compilation quality is not scored because no checked-in same-corpus RAGFlow page, section, citation, or stale-source lint artifact exists.", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json" + }, { "scenario_id": "private_or_large_corpus_ragflow_quality", "suite_id": "retrieval", @@ -1967,6 +2015,64 @@ "command": "cargo make real-world-memory-graph-rag", "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json" }, + { + "scenario_id": "retrieval_quality_context_recall", + "suite_id": "retrieval", + "status": "incomplete", + "elf_position": "untested", + "comparison_outcome": "blocked", + "evidence": "XY-1071 keeps LightRAG retrieval quality incomplete until the opt-in Docker API exports same-corpus context or references that can be joined to expected evidence ids; service startup alone is not a retrieval-quality result.", + "command": "cargo make real-world-memory-graph-rag", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json" + }, + { + "scenario_id": "citation_quality_context_references", + "suite_id": "retrieval", + "status": "incomplete", + "elf_position": "untested", + "comparison_outcome": "blocked", + "evidence": "LightRAG citation quality is incomplete until returned context, references.file_path, references.content, or equivalent source snippets map to generated evidence ids.", + "command": "cargo make real-world-memory-graph-rag", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json" + }, + { + "scenario_id": "navigation_quality_graph_context", + "suite_id": "retrieval", + "status": "incomplete", + "elf_position": "untested", + "comparison_outcome": "blocked", + "evidence": "LightRAG graph/context navigation remains incomplete until exported context exposes source paths or graph-derived source snippets that can be followed back to same-corpus evidence.", + "command": "cargo make real-world-memory-graph-rag", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json" + }, + { + "scenario_id": "answer_faithfulness_context_refs", + "suite_id": "retrieval", + "status": "incomplete", + "elf_position": "untested", + "comparison_outcome": "blocked", + "evidence": "LightRAG answer faithfulness is incomplete until generated answers and only_need_context output can be checked for required evidence, decoy exclusion, and source-reference alignment.", + "command": "cargo make real-world-memory-graph-rag", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json" + }, + { + "scenario_id": "stale_source_behavior", + "suite_id": "retrieval", + "status": "not_encoded", + "elf_position": "untested", + "comparison_outcome": "not_tested", + "evidence": "LightRAG stale-source replacement, invalidation, or lint behavior is not encoded by the current context-source blocker; no stale-source quality claim is made.", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json" + }, + { + "scenario_id": "knowledge_compilation_quality", + "suite_id": "knowledge_compilation", + "status": "not_encoded", + "elf_position": "untested", + "comparison_outcome": "not_tested", + "evidence": "LightRAG knowledge compilation quality is not scored because no checked-in same-corpus page, section, citation, or stale-source lint artifact exists.", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json" + }, { "scenario_id": "graph_rag_navigation_quality", "suite_id": "retrieval", @@ -2126,6 +2232,44 @@ "command": "cargo make real-world-memory-graph-rag", "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/graphrag_output_tables_blocked.json" }, + { + "scenario_id": "retrieval_quality_local_search", + "suite_id": "retrieval", + "status": "not_encoded", + "elf_position": "untested", + "comparison_outcome": "not_tested", + "evidence": "XY-1071 keeps GraphRAG retrieval quality not tested because the current smoke records output-table and local-search reachability contracts but does not score same-corpus retrieval answers beyond mapped output prerequisites.", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json" + }, + { + "scenario_id": "navigation_quality_community_graph", + "suite_id": "knowledge_compilation", + "status": "blocked", + "elf_position": "untested", + "comparison_outcome": "blocked", + "evidence": "GraphRAG community/entity/relationship navigation remains blocked until provider-backed output tables expose community, entity, relationship, text-unit, and document identifiers that map to generated evidence ids.", + "command": "cargo make real-world-memory-graph-rag", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/graphrag_output_tables_blocked.json" + }, + { + "scenario_id": "answer_faithfulness_output_tables", + "suite_id": "knowledge_compilation", + "status": "blocked", + "elf_position": "untested", + "comparison_outcome": "blocked", + "evidence": "GraphRAG answer faithfulness is blocked until summaries or local-search answers can be checked against mapped documents, text units, and community report rows while excluding unsupported or stale claims.", + "command": "cargo make real-world-memory-graph-rag", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/graphrag_output_tables_blocked.json" + }, + { + "scenario_id": "stale_source_behavior", + "suite_id": "knowledge_compilation", + "status": "not_encoded", + "elf_position": "untested", + "comparison_outcome": "not_tested", + "evidence": "GraphRAG stale-source replacement, invalidation, or lint behavior is not encoded by the current output-table blocker; no stale-source quality claim is made.", + "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json" + }, { "scenario_id": "graph_summary_synthesis_quality", "suite_id": "knowledge_compilation", diff --git a/apps/elf-eval/fixtures/report_snapshots/2026-06-23-graph-rag-adapter-matrix-report.json b/apps/elf-eval/fixtures/report_snapshots/2026-06-23-graph-rag-adapter-matrix-report.json new file mode 100644 index 0000000..fd1a650 --- /dev/null +++ b/apps/elf-eval/fixtures/report_snapshots/2026-06-23-graph-rag-adapter-matrix-report.json @@ -0,0 +1,229 @@ +{ + "schema": "elf.graph_rag_adapter_matrix_report/v1", + "report_id": "xy-1071-graph-rag-adapter-matrix-2026-06-23", + "authority": "XY-1071", + "created_at": "2026-06-23T00:00:00Z", + "goal": "Record same-corpus citation, navigation, stale-source, faithfulness, retrieval, and knowledge-compilation adapter coverage for RAGFlow, GraphRAG, and LightRAG without claiming graph/RAG parity.", + "source_artifacts": { + "manifest": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json", + "fixture_dir": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag", + "representative_report": "tmp/real-world-memory/graph-rag/report.json", + "representative_markdown": "tmp/real-world-memory/graph-rag/report.md" + }, + "summary": { + "adapter_count": 3, + "matrix_row_count": 18, + "pass": 0, + "wrong_result": 0, + "incomplete": 4, + "blocked": 8, + "not_encoded": 6, + "retrieval_quality_rows": 3, + "citation_quality_rows": 3, + "navigation_quality_rows": 3, + "stale_source_behavior_rows": 3, + "answer_faithfulness_rows": 3, + "knowledge_compilation_quality_rows": 3, + "broad_graph_rag_parity": "not_proven", + "claim_boundary": "Matrix rows are coverage and blocker evidence only; no graph/RAG win, tie, loss, or parity claim is made without scored same-corpus outputs." + }, + "adapter_matrix": [ + { + "adapter": "RAGFlow", + "dimension": "retrieval_quality", + "scenario_id": "retrieval_quality_reference_recall", + "status": "blocked", + "suite_id": "retrieval", + "required_output": "Answer text and selected reference chunks whose document ids, chunk ids, content, and metadata map to expected same-corpus evidence ids.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ragflow_reference_chunks_blocked.json", + "claim_boundary": "No RAGFlow retrieval-quality result is claimed from setup reachability or a smoke without mapped reference chunks." + }, + { + "adapter": "RAGFlow", + "dimension": "citation_quality", + "scenario_id": "reference_chunk_citation_mapping", + "status": "blocked", + "suite_id": "retrieval", + "required_output": "Returned reference chunks with generated document ids, chunk ids, content, and document metadata.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ragflow_reference_chunks_blocked.json", + "claim_boundary": "No RAGFlow citation-quality pass or ELF-over-RAGFlow claim is allowed." + }, + { + "adapter": "RAGFlow", + "dimension": "navigation_quality", + "scenario_id": "navigation_quality_document_chunks", + "status": "blocked", + "suite_id": "retrieval", + "required_output": "Stable document metadata plus chunk identifiers that can be followed back to same-corpus source evidence.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ragflow_reference_chunks_blocked.json", + "claim_boundary": "Document/chunk navigation remains blocked until reference handles are returned and mapped." + }, + { + "adapter": "RAGFlow", + "dimension": "stale_source_behavior", + "scenario_id": "stale_source_behavior", + "status": "not_encoded", + "suite_id": "retrieval", + "required_output": "A stale or superseded same-corpus source update plus returned citations/lint proving current-source selection.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json", + "claim_boundary": "No stale-source behavior claim is made." + }, + { + "adapter": "RAGFlow", + "dimension": "answer_faithfulness", + "scenario_id": "answer_faithfulness_reference_chunks", + "status": "blocked", + "suite_id": "retrieval", + "required_output": "Generated answers checked against returned reference chunk content with decoy and stale chunks excluded.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ragflow_reference_chunks_blocked.json", + "claim_boundary": "No answer-faithfulness score exists until cited chunks are available." + }, + { + "adapter": "RAGFlow", + "dimension": "knowledge_compilation_quality", + "scenario_id": "knowledge_compilation_quality", + "status": "not_encoded", + "suite_id": "knowledge_compilation", + "required_output": "Same-corpus pages, sections, citations, and stale-source lint emitted by RAGFlow or a contained adapter.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json", + "claim_boundary": "RAGFlow knowledge compilation quality is not scored." + }, + { + "adapter": "LightRAG", + "dimension": "retrieval_quality", + "scenario_id": "retrieval_quality_context_recall", + "status": "incomplete", + "suite_id": "retrieval", + "required_output": "Opt-in Docker API query output with same-corpus context or references joined to expected evidence ids.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json", + "claim_boundary": "No LightRAG retrieval-quality result is claimed from service startup alone." + }, + { + "adapter": "LightRAG", + "dimension": "citation_quality", + "scenario_id": "citation_quality_context_references", + "status": "incomplete", + "suite_id": "retrieval", + "required_output": "Returned context, references.file_path, references.content, or equivalent snippets mapped to generated evidence ids.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json", + "claim_boundary": "LightRAG citation quality remains incomplete until source references export." + }, + { + "adapter": "LightRAG", + "dimension": "navigation_quality", + "scenario_id": "navigation_quality_graph_context", + "status": "incomplete", + "suite_id": "retrieval", + "required_output": "Graph/context source paths or graph-derived snippets that can be followed back to same-corpus evidence.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json", + "claim_boundary": "No LightRAG navigation-quality claim is made without exported source paths or snippets." + }, + { + "adapter": "LightRAG", + "dimension": "stale_source_behavior", + "scenario_id": "stale_source_behavior", + "status": "not_encoded", + "suite_id": "retrieval", + "required_output": "A stale or superseded same-corpus source update plus context/citation output showing current-source selection or lint.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json", + "claim_boundary": "No stale-source behavior claim is made." + }, + { + "adapter": "LightRAG", + "dimension": "answer_faithfulness", + "scenario_id": "answer_faithfulness_context_refs", + "status": "incomplete", + "suite_id": "retrieval", + "required_output": "Generated answers and only_need_context output checked for required evidence, decoy exclusion, and source-reference alignment.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json", + "claim_boundary": "No LightRAG answer-faithfulness score exists until context refs export." + }, + { + "adapter": "LightRAG", + "dimension": "knowledge_compilation_quality", + "scenario_id": "knowledge_compilation_quality", + "status": "not_encoded", + "suite_id": "knowledge_compilation", + "required_output": "Same-corpus pages, sections, citations, and stale-source lint emitted by LightRAG or a contained adapter.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json", + "claim_boundary": "LightRAG knowledge compilation quality is not scored." + }, + { + "adapter": "GraphRAG", + "dimension": "retrieval_quality", + "scenario_id": "retrieval_quality_local_search", + "status": "not_encoded", + "suite_id": "retrieval", + "required_output": "Provider-backed local-search answers over the generated corpus with mapped source rows and expected evidence recall.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json", + "claim_boundary": "GraphRAG retrieval quality is not tested by the current output-table blocker." + }, + { + "adapter": "GraphRAG", + "dimension": "citation_quality", + "scenario_id": "output_table_citation_mapping", + "status": "blocked", + "suite_id": "knowledge_compilation", + "required_output": "Documents, text_units, communities, community_reports, entities, and relationships tables mapped to generated evidence ids.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/graphrag_output_tables_blocked.json", + "claim_boundary": "No GraphRAG citation-quality score exists without mapped output tables." + }, + { + "adapter": "GraphRAG", + "dimension": "navigation_quality", + "scenario_id": "navigation_quality_community_graph", + "status": "blocked", + "suite_id": "knowledge_compilation", + "required_output": "Community, entity, relationship, text-unit, and document identifiers that support graph/community navigation back to evidence ids.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/graphrag_output_tables_blocked.json", + "claim_boundary": "No GraphRAG graph/community navigation claim is made until output tables map." + }, + { + "adapter": "GraphRAG", + "dimension": "stale_source_behavior", + "scenario_id": "stale_source_behavior", + "status": "not_encoded", + "suite_id": "knowledge_compilation", + "required_output": "A stale or superseded source update plus output tables or report lint showing current-source selection.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json", + "claim_boundary": "No GraphRAG stale-source behavior claim is made." + }, + { + "adapter": "GraphRAG", + "dimension": "answer_faithfulness", + "scenario_id": "answer_faithfulness_output_tables", + "status": "blocked", + "suite_id": "knowledge_compilation", + "required_output": "Summaries or local-search answers checked against mapped documents, text units, and community report rows while unsupported or stale claims are excluded.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/graphrag_output_tables_blocked.json", + "claim_boundary": "No GraphRAG answer-faithfulness score exists until output tables map." + }, + { + "adapter": "GraphRAG", + "dimension": "knowledge_compilation_quality", + "scenario_id": "graph_summary_synthesis_quality", + "status": "not_encoded", + "suite_id": "knowledge_compilation", + "required_output": "Provider-backed graph-summary output with mapped tables, citations, unsupported-claim lint, and stale-source handling.", + "current_artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json", + "claim_boundary": "GraphRAG graph-summary synthesis quality remains not tested." + } + ], + "feed_back_to_elf": [ + "Knowledge Workspace should keep citation coverage separate from knowledge compilation quality; source refs alone do not prove answer faithfulness.", + "Recall Debug should expose missing adapter source handles as typed blocker evidence rather than hiding them behind aggregate retrieval scores.", + "Stale-source behavior needs explicit changed-source or validity-window artifacts before any graph/RAG comparison can move beyond not_encoded or blocked." + ], + "claim_boundaries": { + "allowed": [ + "RAGFlow, GraphRAG, and LightRAG now have checked-in matrix rows for retrieval, citation, navigation, stale-source behavior, answer faithfulness, and knowledge compilation coverage.", + "The matrix records typed blockers and not-encoded rows for missing same-corpus outputs.", + "The representative graph/RAG command still reports typed non-pass outcomes." + ], + "not_allowed": [ + "Do not claim graph/RAG parity or broad graph-navigation quality.", + "Do not claim RAGFlow, GraphRAG, or LightRAG pass retrieval, citation, navigation, stale-source, faithfulness, or knowledge-compilation quality until scored artifacts exist.", + "Do not reposition ELF as a generic RAG platform from this adapter matrix." + ] + } +} diff --git a/apps/elf-eval/tests/real_world_job_benchmark.rs b/apps/elf-eval/tests/real_world_job_benchmark.rs index b96fc9c..25025e1 100644 --- a/apps/elf-eval/tests/real_world_job_benchmark.rs +++ b/apps/elf-eval/tests/real_world_job_benchmark.rs @@ -287,6 +287,10 @@ fn graph_rag_citation_navigation_promotion_report_json_path() -> Result report_snapshot_path("2026-06-19-graph-rag-citation-navigation-promotion-report.json") } +fn graph_rag_adapter_matrix_report_json_path() -> Result { + report_snapshot_path("2026-06-23-graph-rag-adapter-matrix-report.json") +} + fn operator_approved_public_proxy_private_addendum_report_json_path() -> Result { report_snapshot_path( "2026-06-19-operator-approved-public-proxy-production-private-addendum.json", @@ -365,6 +369,14 @@ fn graph_rag_citation_navigation_promotion_report_markdown_path() -> Result Result { + Ok(workspace_root()? + .join("docs") + .join("evidence") + .join("benchmarking") + .join("2026-06-23-graph-rag-adapter-matrix-report.md")) +} + fn graph_topic_map_report_markdown_path() -> Result { Ok(workspace_root()? .join("docs") @@ -765,7 +777,7 @@ fn external_adapter_run_summarizes_nonzero_scenario_losses() -> Result<()> { report .pointer("/external_adapters/summary/scenario_position_counts/untested") .and_then(Value::as_u64), - Some(34) + Some(49) ); assert_eq!( report @@ -777,7 +789,7 @@ fn external_adapter_run_summarizes_nonzero_scenario_losses() -> Result<()> { report .pointer("/external_adapters/summary/scenario_outcome_counts/not_tested") .and_then(Value::as_u64), - Some(12) + Some(18) ); let adapters = array_at(&report, "/external_adapters/adapters")?; @@ -941,13 +953,13 @@ fn assert_external_adapter_manifest_scenario_summary(report: &Value) { report .pointer("/external_adapters/summary/scenario_status_counts/blocked") .and_then(Value::as_u64), - Some(16) + Some(21) ); assert_eq!( report .pointer("/external_adapters/summary/scenario_status_counts/incomplete") .and_then(Value::as_u64), - Some(1) + Some(5) ); assert_eq!( report @@ -971,7 +983,7 @@ fn assert_external_adapter_manifest_scenario_summary(report: &Value) { report .pointer("/external_adapters/summary/scenario_status_counts/not_encoded") .and_then(Value::as_u64), - Some(7) + Some(13) ); assert_eq!( report @@ -995,7 +1007,7 @@ fn assert_external_adapter_manifest_scenario_summary(report: &Value) { report .pointer("/external_adapters/summary/scenario_position_counts/untested") .and_then(Value::as_u64), - Some(35) + Some(50) ); assert_eq!( report @@ -1019,13 +1031,13 @@ fn assert_external_adapter_manifest_scenario_summary(report: &Value) { report .pointer("/external_adapters/summary/scenario_outcome_counts/not_tested") .and_then(Value::as_u64), - Some(13) + Some(19) ); assert_eq!( report .pointer("/external_adapters/summary/scenario_outcome_counts/blocked") .and_then(Value::as_u64), - Some(17) + Some(26) ); assert_eq!( report @@ -1705,6 +1717,57 @@ fn assert_graph_rag_representative_scenarios( .is_some_and(|evidence| evidence.contains("not an ELF victory claim")) ); + assert_adapter_matrix_rows( + ragflow_scenarios, + &[ + ("reference_chunk_citation_mapping", "blocked", "blocked"), + ("retrieval_quality_reference_recall", "blocked", "blocked"), + ("navigation_quality_document_chunks", "blocked", "blocked"), + ("answer_faithfulness_reference_chunks", "blocked", "blocked"), + ("stale_source_behavior", "not_encoded", "not_tested"), + ("knowledge_compilation_quality", "not_encoded", "not_tested"), + ], + )?; + assert_adapter_matrix_rows( + lightrag_scenarios, + &[ + ("context_source_reference_mapping", "incomplete", "blocked"), + ("retrieval_quality_context_recall", "incomplete", "blocked"), + ("citation_quality_context_references", "incomplete", "blocked"), + ("navigation_quality_graph_context", "incomplete", "blocked"), + ("answer_faithfulness_context_refs", "incomplete", "blocked"), + ("stale_source_behavior", "not_encoded", "not_tested"), + ("knowledge_compilation_quality", "not_encoded", "not_tested"), + ], + )?; + assert_adapter_matrix_rows( + graphrag_scenarios, + &[ + ("output_table_citation_mapping", "blocked", "blocked"), + ("retrieval_quality_local_search", "not_encoded", "not_tested"), + ("navigation_quality_community_graph", "blocked", "blocked"), + ("answer_faithfulness_output_tables", "blocked", "blocked"), + ("stale_source_behavior", "not_encoded", "not_tested"), + ("graph_summary_synthesis_quality", "not_encoded", "not_tested"), + ], + )?; + + Ok(()) +} + +fn assert_adapter_matrix_rows(scenarios: &[Value], expected: &[(&str, &str, &str)]) -> Result<()> { + for (scenario_id, status, outcome) in expected { + let row = find_by_field(scenarios, "/scenario_id", scenario_id)?; + + assert_eq!(row.pointer("/status").and_then(Value::as_str), Some(*status)); + assert_eq!(row.pointer("/comparison_outcome").and_then(Value::as_str), Some(*outcome)); + assert!( + row.pointer("/evidence") + .and_then(Value::as_str) + .is_some_and(|evidence| !evidence.trim().is_empty()) + ); + } + Ok(()) } @@ -4439,6 +4502,63 @@ fn graph_rag_citation_navigation_promotion_preserves_typed_non_passes() -> Resul Ok(()) } +#[test] +fn graph_rag_adapter_matrix_report_preserves_no_parity_claims() -> Result<()> { + let report = serde_json::from_str::(&fs::read_to_string( + graph_rag_adapter_matrix_report_json_path()?, + )?)?; + let markdown = fs::read_to_string(graph_rag_adapter_matrix_report_markdown_path()?)?; + let benchmarking_index = fs::read_to_string(benchmarking_index_path()?)?; + let readme = fs::read_to_string(readme_path()?)?; + + assert_eq!( + report.pointer("/schema").and_then(Value::as_str), + Some("elf.graph_rag_adapter_matrix_report/v1") + ); + assert_eq!(report.pointer("/authority").and_then(Value::as_str), Some("XY-1071")); + assert_eq!(report.pointer("/summary/matrix_row_count").and_then(Value::as_u64), Some(18)); + assert_eq!(report.pointer("/summary/pass").and_then(Value::as_u64), Some(0)); + assert_eq!(report.pointer("/summary/blocked").and_then(Value::as_u64), Some(8)); + assert_eq!(report.pointer("/summary/incomplete").and_then(Value::as_u64), Some(4)); + assert_eq!(report.pointer("/summary/not_encoded").and_then(Value::as_u64), Some(6)); + assert_eq!( + report.pointer("/summary/broad_graph_rag_parity").and_then(Value::as_str), + Some("not_proven") + ); + + let rows = array_at(&report, "/adapter_matrix")?; + let ragflow_citation = find_matrix_row(rows, "RAGFlow", "citation_quality")?; + let lightrag_retrieval = find_matrix_row(rows, "LightRAG", "retrieval_quality")?; + let graphrag_navigation = find_matrix_row(rows, "GraphRAG", "navigation_quality")?; + let graphrag_retrieval = find_matrix_row(rows, "GraphRAG", "retrieval_quality")?; + + assert_eq!(ragflow_citation.pointer("/status").and_then(Value::as_str), Some("blocked")); + assert_eq!(lightrag_retrieval.pointer("/status").and_then(Value::as_str), Some("incomplete")); + assert_eq!(graphrag_navigation.pointer("/status").and_then(Value::as_str), Some("blocked")); + assert_eq!(graphrag_retrieval.pointer("/status").and_then(Value::as_str), Some("not_encoded")); + assert!(array_contains_str( + &report, + "/claim_boundaries/not_allowed", + "Do not reposition ELF as a generic RAG platform from this adapter matrix." + )?); + assert!(markdown.contains("The graph/RAG comparison remains typed non-pass")); + assert!(markdown.contains("| RAGFlow | `blocked`: answer text plus selected reference chunks")); + assert!(benchmarking_index.contains("2026-06-23-graph-rag-adapter-matrix-report.md")); + assert!(readme.contains("RAGFlow/GraphRAG/LightRAG adapter matrix after XY-1071")); + assert!(readme.contains("Graph/RAG Adapter Matrix Report - June 23, 2026")); + + Ok(()) +} + +fn find_matrix_row<'a>(rows: &'a [Value], adapter: &str, dimension: &str) -> Result<&'a Value> { + rows.iter() + .find(|row| { + row.pointer("/adapter").and_then(Value::as_str) == Some(adapter) + && row.pointer("/dimension").and_then(Value::as_str) == Some(dimension) + }) + .ok_or_else(|| eyre::eyre!("missing matrix row for {adapter} {dimension}")) +} + #[test] fn graph_topic_map_report_wires_source_backed_graph_lite_readback() -> Result<()> { let markdown = fs::read_to_string(graph_topic_map_report_markdown_path()?)?; @@ -5758,9 +5878,9 @@ fn generated_json_report_renders_markdown() -> Result<()> { assert!(markdown.contains("xy844-current-worktree")); assert!(markdown.contains("Existing live-baseline reports remain valid")); assert!(markdown.contains("### Adapter Scenario Judgments")); - assert!(markdown.contains("ELF scenario positions: `wins=10, ties=11, loses=1, untested=35`")); + assert!(markdown.contains("ELF scenario positions: `wins=10, ties=11, loses=1, untested=50`")); assert!(markdown.contains( - "Scenario comparison outcomes: `win=10, tie=11, loss=1, not_tested=13, blocked=17, non_goal=5`" + "Scenario comparison outcomes: `win=10, tie=11, loss=1, not_tested=19, blocked=26, non_goal=5`" )); assert!(markdown.contains("| `claude_mem_live_baseline` | `same_corpus_retrieval`")); assert!(markdown.contains("| `memsearch_live_baseline` | `ttl_expiry_lifecycle`")); diff --git a/docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md b/docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md new file mode 100644 index 0000000..984b39c --- /dev/null +++ b/docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md @@ -0,0 +1,94 @@ +--- +type: Evidence +title: "Graph/RAG Adapter Matrix Report - June 23, 2026" +description: "Checked-in benchmark evidence record for the RAGFlow, GraphRAG, and LightRAG citation/navigation adapter matrix." +resource: docs/evidence/benchmarking/2026-06-23-graph-rag-adapter-matrix-report.md +status: active +authority: current_state +owner: evidence +last_verified: 2026-06-23 +tags: + - docs + - evidence + - benchmarking +--- +# Graph/RAG Adapter Matrix Report - June 23, 2026 + +Goal: Add the XY-1071 adapter matrix for RAGFlow, GraphRAG, and LightRAG while +preserving graph/RAG typed blockers and avoiding any generic RAG-platform claim for +ELF. + +Read this when: You need the current RAGFlow, GraphRAG, and LightRAG coverage rows +for retrieval quality, citation quality, graph or document navigation, stale-source +behavior, answer faithfulness, and knowledge compilation quality. + +Inputs: +`apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json`, +`apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/`, and +`apps/elf-eval/fixtures/report_snapshots/2026-06-23-graph-rag-adapter-matrix-report.json`. + +Outputs: Manifest-backed scenario rows, a checked-in JSON companion, and bounded +claims for ELF Knowledge Workspace and Recall Debug learnings. + +## Executive Judgment + +The graph/RAG comparison remains typed non-pass. The matrix adds coverage clarity, +not quality wins. + +- Matrix rows: 18. +- Pass rows: 0. +- Blocked rows: 8. +- Incomplete rows: 4. +- Not-encoded rows: 6. +- Adapters covered: RAGFlow, LightRAG, and GraphRAG. +- Dimensions covered for each adapter: retrieval quality, citation quality, + navigation quality, stale-source behavior, answer faithfulness, and knowledge + compilation quality. + +No graph/RAG parity claim is made. No RAGFlow, GraphRAG, or LightRAG retrieval, +citation, navigation, stale-source, faithfulness, or knowledge-compilation pass is +claimed until scored same-corpus outputs exist. + +## Adapter Matrix + +| Adapter | Retrieval Quality | Citation Quality | Navigation Quality | Stale-Source Behavior | Answer Faithfulness | Knowledge Compilation | +| --- | --- | --- | --- | --- | --- | --- | +| RAGFlow | `blocked`: answer text plus selected reference chunks must map to evidence ids. | `blocked`: returned chunks need document ids, chunk ids, content, and metadata. | `blocked`: document/chunk handles must be followable to source evidence. | `not_encoded`: no stale-source replacement or lint artifact. | `blocked`: answers must be checked against cited chunks and decoys. | `not_encoded`: no page, section, citation, or lint artifact. | +| LightRAG | `incomplete`: Docker API context export is not available by default. | `incomplete`: context references or file paths must map to evidence ids. | `incomplete`: graph/context source paths or snippets must be exported. | `not_encoded`: no stale-source replacement or lint artifact. | `incomplete`: only_need_context output must support answer checking. | `not_encoded`: no page, section, citation, or lint artifact. | +| GraphRAG | `not_encoded`: local-search retrieval quality is not scored. | `blocked`: output tables must map documents, text units, communities, reports, entities, and relationships to evidence ids. | `blocked`: community/entity/relationship navigation requires mapped output tables. | `not_encoded`: no stale-source replacement or lint artifact. | `blocked`: summaries or local-search answers must be checked against mapped tables. | `not_encoded`: graph-summary synthesis quality remains not tested. | + +## Checked-In Evidence + +| Artifact | Role | +| --- | --- | +| `apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json` | Manifest-backed adapter scenario matrix. | +| `apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/ragflow_reference_chunks_blocked.json` | RAGFlow same-corpus reference-chunk typed blocker. | +| `apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/lightrag_context_sources_incomplete.json` | LightRAG same-corpus context/source typed incomplete state. | +| `apps/elf-eval/fixtures/real_world_external_adapters/graph_rag/graphrag_output_tables_blocked.json` | GraphRAG same-corpus output-table typed blocker. | +| `apps/elf-eval/fixtures/report_snapshots/2026-06-23-graph-rag-adapter-matrix-report.json` | XY-1071 durable matrix snapshot. | + +## ELF Feedback + +- Knowledge Workspace should keep citation coverage separate from knowledge + compilation quality. Source refs alone do not prove answer faithfulness. +- Recall Debug should surface missing adapter source handles as typed blocker evidence + rather than hiding them behind aggregate retrieval scores. +- Stale-source behavior needs explicit changed-source or validity-window artifacts + before any graph/RAG comparison can move beyond `not_encoded` or `blocked`. + +## Claim Boundaries + +Allowed: + +- The RAGFlow, GraphRAG, and LightRAG matrix rows are checked in. +- The current rows preserve typed blockers, incomplete setup states, and not-encoded + quality dimensions. +- The representative graph/RAG command remains the focused rerun path. + +Not allowed: + +- Do not claim graph/RAG parity or broad graph-navigation quality. +- Do not claim RAGFlow, GraphRAG, or LightRAG pass retrieval, citation, navigation, + stale-source, faithfulness, or knowledge-compilation quality until scored artifacts + exist. +- Do not reposition ELF as a generic RAG platform from this adapter matrix. diff --git a/docs/evidence/benchmarking/index.md b/docs/evidence/benchmarking/index.md index 83d52cc..d80e5bc 100644 --- a/docs/evidence/benchmarking/index.md +++ b/docs/evidence/benchmarking/index.md @@ -54,3 +54,4 @@ Routes to: Benchmarking evidence concepts under `docs/evidence/benchmarking/`. - `2026-06-22-pageindex-openkb-same-corpus-adapter-report.md`: PageIndex/OpenKB Same-Corpus Adapter Report - June 22, 2026; adds `cargo make real-world-memory-pageindex-openkb`, emits checked-in same-corpus typed setup blockers for PageIndex and OpenKB, names source ids and required materialized outputs, and preserves no parity, win, tie, or loss claim. - `2026-06-22-mem0-openmemory-letta-memory-history-core-archive-report.md`: mem0/OpenMemory and Letta Memory-History/Core-Archive Adapter Report - June 22, 2026; adds `cargo make real-world-memory-mem0-openmemory-letta`, maps mem0 SDK history/export outputs to source ids, preserves OpenMemory UI/export as a product blocker, preserves Letta core/archive readback as typed blockers, and makes no hosted/product parity claim. - `2026-06-23-temporal-trajectory-adapter-coverage-report.md`: Temporal and Trajectory Adapter Coverage Report - June 23, 2026; refreshes Graphiti/Zep temporal-validity and OpenViking context-trajectory adapter evidence with trace-stage typed blockers, source ids, and explicit no-parity boundaries. +- `2026-06-23-graph-rag-adapter-matrix-report.md`: Graph/RAG Adapter Matrix Report - June 23, 2026; adds manifest-backed RAGFlow, GraphRAG, and LightRAG rows for retrieval, citation, navigation, stale-source behavior, answer faithfulness, and knowledge compilation while preserving 0 pass rows and no graph/RAG parity claim.