BigQuery Agent Analytics Roadmap

# BigQuery Agent Analytics — Roadmap

A tiered roadmap built from a survey of the 19 open issues on this repo, the
22 sections of the current `SDK.md` user manual, the 38 Python modules in
`src/bigquery_agent_analytics/`, and the ADK plugin
(`google/adk/plugins/bigquery_agent_analytics_plugin.py`, ~3,500 LOC, 14
lifecycle callback hooks, BigQuery Storage Write API path with GCS offload).

Filed for discussion. Estimates are calibrated to a single experienced
SDK / plugin engineer; multiply for parallel streams. Impact and effort
are best-effort; the implementer of each item will refine them.

> **Ground rules for this doc.** Impact = `H/M/L` on a "downstream user
> adoption + DevX uplift" axis, not revenue. Effort = engineer-weeks
> (full-time equivalent), confidence band in parens. Items marked
> *strategic decision pending* need a maintainer call before
> sequencing.

---

## TL;DR

- **Three workstreams already mature**: trace reconstruction, deterministic
  + LLM-judge + categorical evaluation, CLI gating. Don't re-prioritize them;
  keep them stable.
- **One workstream is the next center of gravity**: the *self-improvement
  loop* — quality scorecard (#63), automated benchmarking (#95), agent
  improvement cycle (already shipped as a demo), ReasoningBank (#49). This
  is the user-facing differentiator over the next quarter.
- **One workstream is high-leverage but quietly under-invested**: plugin
  telemetry. The cache-hit-rate metric (#32) is a one-week win that
  unlocks the cost-optimization narrative and ties directly into the
  existing CodeEvaluator surface.
- **One workstream is over-extended**: ontology / context graph (~6 open
  issues across #12, #30, #38, #57, #58, #75, #76, #93). Multiple design
  proposals are gating implementation. Consolidate to one epic per quarter.
- **One workstream needs a strategic decision before any code**: the
  ontology pipeline migration to upstream `bigquery_ontology` (#38) —
  runtime contract migration, not a module swap; needs a go/no-go.

---

## Method

How I built this:

1. Read each open issue's body + first 2-3 comments. Categorized by
   workstream.
2. Surveyed `src/bigquery_agent_analytics/` (38 modules) and `SDK.md`
   (22 sections, ~1700 lines) to understand what already ships.
3. Read the ADK plugin to understand the upstream telemetry contract
   the SDK consumes.
4. Tiered each item by impact × effort, then sequenced for parallel
   workstreams.
5. Flagged items that need a strategic call before scoping (marked
   *strategic decision pending*).

---

## Workstream map (where the open issues live)

| Workstream | What ships today | Open issues |
|---|---|---|
| **Plugin telemetry** | 14 lifecycle hooks → `agent_events`; GCS offload; PyArrow schema | #32 |
| **Trace consumption** | `Client.get_session_trace`, `list_traces`, tree render | (none open — mature) |
| **Evaluation** | Code, LLM-as-Judge, categorical, trajectory, multi-trial, grader pipeline, eval suite, eval validator | #84 |
| **Self-improvement loop** | `agent_improvement_cycle` example, `quality_report.py` | #63, #95, #49 |
| **Insights / drift / memory** | `client.insights()`, drift detection, `memory_service` | #74 |
| **Ontology / context graph** | V5 pipeline, `gm` CLI, OWL importer, DDL compiler, materializer, `compile_concept_index` | #12, #30, #38, #57, #58, #75, #76, #93 |
| **DevX / CLI / docs** | `bq-agent-sdk` CLI, `SDK.md`, `examples/`, blog series | #10 (wishlist), #51, #53, #77, #82 (series) |

---

## P0 — ship in ≤2 weeks (low-risk, asked-for)

| Item | Issue | Impact | Effort | Notes |
|---|---|---|---|---|
| **Quote-escape `feedback="..."` snippet** in `evaluate --exit-code` FAIL output | #84 | L | 0.25 wk (high) | One-line escape pass; add unit test with embedded `"` and `\`. Surfaced from blog #3 live capture. |
| **Context cache hit-rate metric** (plugin schema add + new `CodeEvaluator.context_cache_hit_rate()`) | #32 | H | 1.5 wk (med) | Plugin: extract `cached_content_token_count` from Gemini `usage_metadata`. SDK: add `cache_hit_rate` evaluator + threshold. Schema change requires a backward-compat plan (new column, default NULL). Strong tie-in to post #2's cost narrative. |
| **Close out blog series posts #1-#3** in #51 / #53 / #77 / #82 | #51 #53 #77 #82 | M | 0.5 wk | Update series-plan checkboxes; mark #82 closed once post #3 is live. Pure docs hygiene. |

**P0 total**: ~2.25 eng-weeks.

---

## P1 — ship in 1 month (high leverage)

| Item | Issue | Impact | Effort | Notes |
|---|---|---|---|---|
| **Quality Scorecard Phase 1**: pre-built rubric factories (`evaluation_rubrics.py`) over existing `CategoricalEvaluator` | #63 | H | 2 wk (med) | Already in flight — Gayathri uploaded `evaluation_rubrics.py` for review. Locks in three pillars (`response_usefulness`, `task_grounding`, `policy_compliance`) using existing categorical vocabulary. Additive, reuses existing dashboard views. |
| **Quality Scorecard Phase 2**: persist `root_agent_name` + `region` on `categorical_results` + new `categorical_fleet_leaderboard` view | #63 | H | 2 wk (med) | Schema-affecting; needs `ALTER TABLE` migration plan. Bridges the eval results → fleet ranking gap that the SDK doesn't have today. |
| **Quality Scorecard Phase 3**: `Client.triage_low_score_sessions(...)` + `hitl_triage_queue` table | #63 | M | 2 wk (med) | The genuinely net-new piece. Decide upfront: idempotent `MERGE` vs append-only? `resolved_at` lifecycle column? Worth a small design note before implementation. |
| **Inheritance (`extends`) compilation** in ontology DDL compiler | #30 | M | 2 wk (low) | Three candidate strategies named in the issue (fan-out / union view / label-referenced edges). Decide one, implement, ship as `gm compile --emit-extends-as=…` flag. |
| **Compile-time extractor prerequisite**: ontology-aware `validate_extracted_graph(spec, graph)` | #76 | M | 2 wk (med) | Independent shipping value; gates #75 epic. Adds field/node/edge/event-level fallback classification. |
| **`evaluate --suggest-thresholds` baseline helper** (deferred from blog #2 polish) | (no issue yet — file one) | M | 1 wk (high) | Reads last N days of prod, prints suggested per-metric thresholds with a buffer. Halves the prose burden of blog #2's "how do I pick thresholds" sidebar. |

**P1 total**: ~11 eng-weeks. With one engineer: about a calendar month if focused; with two engineers parallelizing the scorecard track and the ontology + helper tracks, ~3 weeks.

---

## P2 — ship in 1 quarter (strategic, larger effort)

| Item | Issue | Impact | Effort | Notes |
|---|---|---|---|---|
| **ReasoningBank**: per-user/per-session memory of past distilled outcomes, loaded as initial agent context | #49 | H | 4 wk (low) | Storage layer (BQ table for memories), distillation pipeline (LLM-as-Judge + summarization), retrieval API (`MemoryService.load_relevant_memories(...)`), agent integration shape (callable from plugin or app). Needs a small design RFC first because memory shape affects every downstream consumer. |
| **Compile-time code generation for structured extractors (Phase 1)** — only `extract_bka_decision_event` and the structured-event registry | #75 | M | 3 wk (med) | Gated on #76 landing. Phase 1 scope is deliberate: known structured event schemas only, no free-text. Server-side `AI.GENERATE` stays as semantic fallback until precision/recall is measured. |
| **Ontology pipeline migration to `bigquery_ontology` upstream package** *(strategic decision pending)* | #38 | H | 4 wk (low) | Runtime contract migration, not a module swap. Maintainer needs to decide: full migration vs. keep SDK pipeline as a thin wrapper. Risk is high (consumed across 5+ modules); upside is dropping ~5K LOC of duplicate code. |
| **SKOS import support** alongside OWL | #57 | M | 2 wk (med) | Design proposal phase; needs feedback round resolved. Follows #38 because it should land in `bigquery_ontology` not in this repo if migration goes ahead. |
| **Runtime entity resolution primitives** — `OntologyRuntime`, concept index lookups, `EntityResolver` protocol | #58 | M | 4 wk (low) | Design proposal currently. Quoted user feedback: ~85% of brief-validation value sits at runtime, not schema time. Big payoff for production-agentic users; design surface needs to land first. |
| **Auto-benchmark from traces** — extract high-signal success/failure pairs to seed eval suites | #95 (Pillar 1) | H | 3 wk (med) | Builds on existing `quality_report.py` + `agent_improvement_cycle`. Generalize the cycle into a reusable extractor. Cross-links to Vertex AI Prompt Optimizer integration (post #4 in the blog series). |
| **Streaming evaluation** — Pub/Sub + continuous query path that scores sessions as events arrive | #10 (item 2) | M | 3 wk (low) | Partial scaffolding exists at `_streaming_evaluation.py`. Productization needs an architectural call: on-arrival vs. micro-batch, latency budgets, schema for "in-flight" partial sessions. |

**P2 total**: ~23 eng-weeks. With two engineers parallel-streaming, ~12 weeks (one quarter).

---

## P3 — research / future (defer until P0-P2 ships)

| Item | Issue | Impact | Effort | Notes |
|---|---|---|---|---|
| **Auto-skills loop** based on AutoSkill paper (arxiv 2603.01145) — agents learn reusable skills from interaction history | #95 (Pillar 2) | H | 6+ wk (low) | Cite trace as skill-formation source; abstract patterns into reusable skill objects. Big bet, mostly research. |
| **Behavioral diff (SxS) analysis** between two agent runs on identical tasks — "divergence point" detection | #10 (item 1) | M | 4 wk (low) | Needs trajectory representation that supports semantic diff (more than character-level). Cross-cuts trajectory matching + insights. |
| **Live agent resolution** — turn-time `EntityResolver` for live agents (extends #58) | #93 (Gap 1) | M | 6 wk (low) | Out-of-scope for #58; needs an agent-facing package extension of ADK. Latency budget: <50ms target. Big architecture decision. |
| **Advanced LLM + embedding-based resolvers** | #93 (Gap 2) | M | 4 wk (low) | Builds on #58's `EntityResolver` protocol. Embedding store + cosine-similarity layer + LLM-disambiguation tier. |
| **SHACL constraint validation** | #93 (Gap 3) | L | 3 wk (low) | Requires SHACL parser; integrates with `validate_extracted_graph` from #76. Niche but needed for governance-heavy verticals. |
| **Spanner / MAKO backends** for ontology storage | #93 (Gap 4) | L | 8+ wk (low) | Alternative storage to BigQuery for ontology data. Significant — multiple concurrent backends multiply maintenance. Defer unless an ADK-team partner needs it. |
| **V5 Context Graph** — TTL import + mixed extraction + temporal lineage (BigQuery-only V5 demo) | #12 | M | 6 wk (low) | Existing design doc. Sequence after `bigquery_ontology` migration decision (#38). |
| **Insights from latest research papers** — three arxiv refs, currently unscoped | #74 | ? | 1 wk just to triage | Needs a triage pass to convert into concrete proposals before estimating real effort. |
| **Blog series posts #4-#10** — analyst views, agent quality scorecard publication, real-time dashboards, ontology, HITL safety | #51 (slots 4-10) | H | 2 wk per post (high) | Cadence: ~one per 2-3 weeks. Critical for SDK adoption. Each post drafted by SDK lead + reviewed against live demo. |

---

## Items to deprecate / explicitly close

| Item | Status |
|---|---|
| `_LEGACY_LLM_JUDGE_BATCH_QUERY` (`ML.GENERATE_TEXT` path inside the LLM-judge cascade) | Mark deprecated. AI.GENERATE works without a connection now (per post #3 finding). Keep as fallback for one more release, then remove. |
| `--strict` for API-fallback judge errors | Already documented as a no-op in this case. Consider auto-disabling and warning rather than silently no-op'ing. |
| Combined `--spec-path` flag (`gm compile --spec-path ...`) | Already deprecated in favor of `--ontology PATH --binding PATH`. Schedule removal for 0.4.x. |

---

## Strategic decisions pending (need maintainer call)

These three are gating P2 work. Recommend a single decision-doc PR that resolves all three:

1. **Ontology migration to `bigquery_ontology` (#38).** Full migration vs. thin wrapper. Resolution affects #57, #75, #76, #93 sequencing.
2. **Runtime entity resolution surface boundary (#58).** SDK ships the primitives; what's the agent-facing layer that consumes them, and does it live in this repo or the ADK plugin or a new package? Resolution affects #93 Gap 1.
3. **ReasoningBank shape (#49).** Storage table + distillation pipeline + retrieval API. The shape this lands in determines whether it can be reused by `agent_improvement_cycle` (existing demo) or whether they're parallel systems.

---

## Resource budget summary

| Engineer-weeks | What ships |
|---|---|
| **2 weeks (1 eng)** | All of P0. Quote-escape, cache hit-rate metric, blog series cleanup. |
| **1 month (1 eng)** | P0 + Quality Scorecard Phase 1 + inheritance compilation + extractor prerequisite. |
| **1 month (2 eng parallel)** | P0 + all of P1 (Scorecard Phase 1-3, inheritance, extractor prereq, baseline helper). |
| **1 quarter (1 eng)** | P0 + P1 + ~half of P2 (pick: ReasoningBank OR ontology migration OR streaming eval). |
| **1 quarter (2 eng parallel)** | P0 + P1 + all of P2 except the strategic-decision-pending items. |
| **2 quarters (3 eng)** | Everything except deep-research P3 items (auto-skills, V5 context graph, Spanner/MAKO). |

---

## Sequencing rationale (short version)

- **P0 first** because it's small, asked-for, and unblocks the cost-narrative thread of post #2 readers (cache hit rate is exactly the "tune your token budget" follow-up).
- **Quality Scorecard before everything else in P1** because it's already in flight (Gayathri's PR coming) and it converts the existing categorical evaluator surface from "I have to design my own metrics" → "here's a known-good rubric." Adoption boost.
- **Ontology consolidation in P2 not P1** because it's the most expensive single decision and we don't want to block the scorecard / extractor work behind it. The strategic-decision PR for #38 can run in parallel with P1 implementation.
- **ReasoningBank in P2 not P3** because it ties the agent_improvement_cycle demo to a real product surface. Without ReasoningBank, the demo is "look, agents can self-improve in this contained example"; with ReasoningBank, it's "your agent's memory, in a queryable BQ table."
- **Auto-skills, V5 context graph, Spanner backends to P3** because they're large bets that need a research arm or a partner team to justify the investment. Don't block production work on them.

---

## What this roadmap deliberately doesn't do

- No prioritization of `bigquery_ontology` ↔ `BigQuery-Agent-Analytics-SDK` repo splits beyond the existing #38 decision. That's an organizational call.
- No commitment to specific calendar dates. Sequencing is relative; absolute dates depend on engineer count + DevRel review cycles + GA gates.
- No mention of internal-only Google Cloud product integrations (Vertex AI Agent Engine, etc.) beyond what the public surface already covers. Those would be a parallel internal roadmap.

---

## How to use this issue

- **Maintainer**: leave reactions on items you agree with the priority of; comment with re-rankings on items you disagree with; resolve the three strategic decisions above so P2 can sequence.
- **Contributors**: pick a P0 or P1 item that matches your interest, drop a "I can take this" comment, and the maintainer can hand the linked issue over.
- **Quarterly review**: this roadmap should be re-checked every ~6 weeks. The shape of evaluation work (scorecard, auto-benchmarking) is moving fastest right now and may push items between tiers.

---

*Generated 2026-04-28 from a survey of the 19 open issues, the SDK surface
at `target/main`, and the ADK plugin at `google/adk/plugins/bigquery_agent_analytics_plugin.py`.
Open to revision; this is a starting point, not a contract.*


Workstream	What ships today	Open issues
Plugin telemetry	14 lifecycle hooks → `agent_events`; GCS offload; PyArrow schema	#32
Trace consumption	`Client.get_session_trace`, `list_traces`, tree render	(none open — mature)
Evaluation	Code, LLM-as-Judge, categorical, trajectory, multi-trial, grader pipeline, eval suite, eval validator	#84
Self-improvement loop	`agent_improvement_cycle` example, `quality_report.py`	#63, #95, #49
Insights / drift / memory	`client.insights()`, drift detection, `memory_service`	#74
Ontology / context graph	V5 pipeline, `gm` CLI, OWL importer, DDL compiler, materializer, `compile_concept_index`	#12, #30, #38, #57, #58, #75, #76, #93
DevX / CLI / docs	`bq-agent-sdk` CLI, `SDK.md`, `examples/`, blog series	#10 (wishlist), #51, #53, #77, #82 (series)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery Agent Analytics Roadmap #96

BigQuery Agent Analytics — Roadmap

TL;DR

Method

Workstream map (where the open issues live)

P0 — ship in ≤2 weeks (low-risk, asked-for)

P1 — ship in 1 month (high leverage)

P2 — ship in 1 quarter (strategic, larger effort)

P3 — research / future (defer until P0-P2 ships)

Items to deprecate / explicitly close

Strategic decisions pending (need maintainer call)

Resource budget summary

Sequencing rationale (short version)

What this roadmap deliberately doesn't do

How to use this issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Item	Issue	Impact	Effort	Notes
Quote-escape `feedback="..."` snippet in `evaluate --exit-code` FAIL output	#84	L	0.25 wk (high)	One-line escape pass; add unit test with embedded `"` and `\`. Surfaced from blog #3 live capture.
Context cache hit-rate metric (plugin schema add + new `CodeEvaluator.context_cache_hit_rate()`)	#32	H	1.5 wk (med)	Plugin: extract `cached_content_token_count` from Gemini `usage_metadata`. SDK: add `cache_hit_rate` evaluator + threshold. Schema change requires a backward-compat plan (new column, default NULL). Strong tie-in to post #2's cost narrative.
Close out blog series posts #1-#3 in #51 / #53 / #77 / #82	#51 #53 #77 #82	M	0.5 wk	Update series-plan checkboxes; mark #82 closed once post #3 is live. Pure docs hygiene.

Item	Issue	Impact	Effort	Notes
Quality Scorecard Phase 1: pre-built rubric factories (`evaluation_rubrics.py`) over existing `CategoricalEvaluator`	#63	H	2 wk (med)	Already in flight — Gayathri uploaded `evaluation_rubrics.py` for review. Locks in three pillars (`response_usefulness`, `task_grounding`, `policy_compliance`) using existing categorical vocabulary. Additive, reuses existing dashboard views.
Quality Scorecard Phase 2: persist `root_agent_name` + `region` on `categorical_results` + new `categorical_fleet_leaderboard` view	#63	H	2 wk (med)	Schema-affecting; needs `ALTER TABLE` migration plan. Bridges the eval results → fleet ranking gap that the SDK doesn't have today.
Quality Scorecard Phase 3: `Client.triage_low_score_sessions(...)` + `hitl_triage_queue` table	#63	M	2 wk (med)	The genuinely net-new piece. Decide upfront: idempotent `MERGE` vs append-only? `resolved_at` lifecycle column? Worth a small design note before implementation.
Inheritance (`extends`) compilation in ontology DDL compiler	#30	M	2 wk (low)	Three candidate strategies named in the issue (fan-out / union view / label-referenced edges). Decide one, implement, ship as `gm compile --emit-extends-as=…` flag.
Compile-time extractor prerequisite: ontology-aware `validate_extracted_graph(spec, graph)`	#76	M	2 wk (med)	Independent shipping value; gates #75 epic. Adds field/node/edge/event-level fallback classification.
`evaluate --suggest-thresholds` baseline helper (deferred from blog #2 polish)	(no issue yet — file one)	M	1 wk (high)	Reads last N days of prod, prints suggested per-metric thresholds with a buffer. Halves the prose burden of blog #2's "how do I pick thresholds" sidebar.

Item	Issue	Impact	Effort	Notes
ReasoningBank: per-user/per-session memory of past distilled outcomes, loaded as initial agent context	#49	H	4 wk (low)	Storage layer (BQ table for memories), distillation pipeline (LLM-as-Judge + summarization), retrieval API (`MemoryService.load_relevant_memories(...)`), agent integration shape (callable from plugin or app). Needs a small design RFC first because memory shape affects every downstream consumer.
Compile-time code generation for structured extractors (Phase 1) — only `extract_bka_decision_event` and the structured-event registry	#75	M	3 wk (med)	Gated on #76 landing. Phase 1 scope is deliberate: known structured event schemas only, no free-text. Server-side `AI.GENERATE` stays as semantic fallback until precision/recall is measured.
Ontology pipeline migration to `bigquery_ontology` upstream package (strategic decision pending)	#38	H	4 wk (low)	Runtime contract migration, not a module swap. Maintainer needs to decide: full migration vs. keep SDK pipeline as a thin wrapper. Risk is high (consumed across 5+ modules); upside is dropping ~5K LOC of duplicate code.
SKOS import support alongside OWL	#57	M	2 wk (med)	Design proposal phase; needs feedback round resolved. Follows #38 because it should land in `bigquery_ontology` not in this repo if migration goes ahead.
Runtime entity resolution primitives — `OntologyRuntime`, concept index lookups, `EntityResolver` protocol	#58	M	4 wk (low)	Design proposal currently. Quoted user feedback: ~85% of brief-validation value sits at runtime, not schema time. Big payoff for production-agentic users; design surface needs to land first.
Auto-benchmark from traces — extract high-signal success/failure pairs to seed eval suites	#95 (Pillar 1)	H	3 wk (med)	Builds on existing `quality_report.py` + `agent_improvement_cycle`. Generalize the cycle into a reusable extractor. Cross-links to Vertex AI Prompt Optimizer integration (post #4 in the blog series).
Streaming evaluation — Pub/Sub + continuous query path that scores sessions as events arrive	#10 (item 2)	M	3 wk (low)	Partial scaffolding exists at `_streaming_evaluation.py`. Productization needs an architectural call: on-arrival vs. micro-batch, latency budgets, schema for "in-flight" partial sessions.

Item	Issue	Impact	Effort	Notes
Auto-skills loop based on AutoSkill paper (arxiv 2603.01145) — agents learn reusable skills from interaction history	#95 (Pillar 2)	H	6+ wk (low)	Cite trace as skill-formation source; abstract patterns into reusable skill objects. Big bet, mostly research.
Behavioral diff (SxS) analysis between two agent runs on identical tasks — "divergence point" detection	#10 (item 1)	M	4 wk (low)	Needs trajectory representation that supports semantic diff (more than character-level). Cross-cuts trajectory matching + insights.
Live agent resolution — turn-time `EntityResolver` for live agents (extends #58)	#93 (Gap 1)	M	6 wk (low)	Out-of-scope for #58; needs an agent-facing package extension of ADK. Latency budget: <50ms target. Big architecture decision.
Advanced LLM + embedding-based resolvers	#93 (Gap 2)	M	4 wk (low)	Builds on #58's `EntityResolver` protocol. Embedding store + cosine-similarity layer + LLM-disambiguation tier.
SHACL constraint validation	#93 (Gap 3)	L	3 wk (low)	Requires SHACL parser; integrates with `validate_extracted_graph` from #76. Niche but needed for governance-heavy verticals.
Spanner / MAKO backends for ontology storage	#93 (Gap 4)	L	8+ wk (low)	Alternative storage to BigQuery for ontology data. Significant — multiple concurrent backends multiply maintenance. Defer unless an ADK-team partner needs it.
V5 Context Graph — TTL import + mixed extraction + temporal lineage (BigQuery-only V5 demo)	#12	M	6 wk (low)	Existing design doc. Sequence after `bigquery_ontology` migration decision (#38).
Insights from latest research papers — three arxiv refs, currently unscoped	#74	?	1 wk just to triage	Needs a triage pass to convert into concrete proposals before estimating real effort.
Blog series posts #4-#10 — analyst views, agent quality scorecard publication, real-time dashboards, ontology, HITL safety	#51 (slots 4-10)	H	2 wk per post (high)	Cadence: ~one per 2-3 weeks. Critical for SDK adoption. Each post drafted by SDK lead + reviewed against live demo.

Item	Status
`_LEGACY_LLM_JUDGE_BATCH_QUERY` (`ML.GENERATE_TEXT` path inside the LLM-judge cascade)	Mark deprecated. AI.GENERATE works without a connection now (per post #3 finding). Keep as fallback for one more release, then remove.
`--strict` for API-fallback judge errors	Already documented as a no-op in this case. Consider auto-disabling and warning rather than silently no-op'ing.
Combined `--spec-path` flag (`gm compile --spec-path ...`)	Already deprecated in favor of `--ontology PATH --binding PATH`. Schedule removal for 0.4.x.

Engineer-weeks	What ships
2 weeks (1 eng)	All of P0. Quote-escape, cache hit-rate metric, blog series cleanup.
1 month (1 eng)	P0 + Quality Scorecard Phase 1 + inheritance compilation + extractor prerequisite.
1 month (2 eng parallel)	P0 + all of P1 (Scorecard Phase 1-3, inheritance, extractor prereq, baseline helper).
1 quarter (1 eng)	P0 + P1 + ~half of P2 (pick: ReasoningBank OR ontology migration OR streaming eval).
1 quarter (2 eng parallel)	P0 + P1 + all of P2 except the strategic-decision-pending items.
2 quarters (3 eng)	Everything except deep-research P3 items (auto-skills, V5 context graph, Spanner/MAKO).

BigQuery Agent Analytics Roadmap #96

Description

BigQuery Agent Analytics — Roadmap

TL;DR

Method

Workstream map (where the open issues live)

P0 — ship in ≤2 weeks (low-risk, asked-for)

P1 — ship in 1 month (high leverage)

P2 — ship in 1 quarter (strategic, larger effort)

P3 — research / future (defer until P0-P2 ships)

Items to deprecate / explicitly close

Strategic decisions pending (need maintainer call)

Resource budget summary

Sequencing rationale (short version)

What this roadmap deliberately doesn't do

How to use this issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions