GitHub Issue: BigQuery Plugin parent_span_id Points to Unlogged OTel Spans
🔴 Required Information
Describe the Bug:
When OpenTelemetry instrumentation is active (e.g., opentelemetry-instrumentation-google-genai), the BigQueryAgentAnalyticsPlugin writes parent_span_id values that reference framework-internal OTel spans (call_llm, execute_tool, send_data) which are never logged to the BigQuery table. This makes the parent_span_id column unusable for building parent-child relationships within BigQuery — it points to spans that don't exist in the table.
The consequence is that any SQL JOIN like LEFT JOIN agent_events_view A ON T.parent_span_id = A.span_id returns NULL for the agent context, making it impossible to correlate LLM/tool events with their enclosing agent within BigQuery alone.
In our dataset, 92.8% of LLM_RESPONSE events and 10.3% of TOOL events have phantom parent_span_id values.
Steps to Reproduce:
- Create an ADK agent with sub-agents and tools (see minimal reproduction below)
- Install
opentelemetry-instrumentation-google-genai (or any package that activates OTel tracing)
- Configure the
BigQueryAgentAnalyticsPlugin and run the agent
- Query the BQ table to find parent_span_ids that don't exist as span_ids:
-- Find "phantom" parent_span_ids (referenced but never logged)
WITH logged_spans AS (
SELECT DISTINCT span_id FROM `project.dataset.table`
),
parent_refs AS (
SELECT DISTINCT parent_span_id
FROM `project.dataset.table`
WHERE parent_span_id IS NOT NULL
)
SELECT p.parent_span_id,
CASE WHEN s.span_id IS NOT NULL THEN 'EXISTS' ELSE 'PHANTOM' END as status
FROM parent_refs p
LEFT JOIN logged_spans s ON p.parent_span_id = s.span_id
WHERE s.span_id IS NULL;
-- Returns dozens/hundreds of phantom span_ids
- Attempt to JOIN tool or LLM events to their parent agent:
SELECT T.tool_name, T.parent_span_id, A.agent_name, A.duration_ms
FROM tool_events_view T
LEFT JOIN agent_events_view A ON T.parent_span_id = A.span_id
-- A.agent_name and A.duration_ms are NULL for all phantom cases
Expected Behavior:
Every parent_span_id written to BigQuery should reference a span_id that also exists in the same BigQuery table, forming a self-consistent span tree. This would allow users to traverse the parent-child relationship to build execution trees, correlate tool/LLM latency with agent context, and construct dashboards that show the full execution hierarchy.
Observed Behavior:
parent_span_id frequently points to internal ADK framework spans that are only visible in Cloud Trace but never written to BigQuery. The BQ span graph has "dangling pointers" that break any parent-child JOIN.
Concrete example from a single trace (c7f3ed279e620d0e26df69f9165406e8):
Trace raw events in BigQuery:
timestamp event_type agent span_id parent_span_id
──────────────────────────────────────────────────────────────────────────────────────────────────────────
2026-04-09 21:32:40.762 USER_MSG_RECEIVED knowledge_supervisor 351611af4c2a9ca5 (null)
2026-04-09 21:32:40.764 INVOCATION_START knowledge_supervisor 351611af4c2a9ca5 (null)
2026-04-09 21:32:40.765 AGENT_STARTING knowledge_supervisor 40fad0f4b5d8203c 351611af4c2a9ca5 ← OK
2026-04-09 21:32:40.766 LLM_REQUEST knowledge_supervisor 5179bd626c6d4fb9 40fad0f4b5d8203c ← OK
2026-04-09 21:32:45.825 LLM_RESPONSE knowledge_supervisor 5179bd626c6d4fb9 40fad0f4b5d8203c ← OK
2026-04-09 21:32:45.826 TOOL_STARTING knowledge_supervisor fb9761a38d9251ba 84a19a6ed0f23b06 ← PHANTOM!
2026-04-09 21:32:45.826 TOOL_COMPLETED knowledge_supervisor fb9761a38d9251ba 84a19a6ed0f23b06 ← PHANTOM!
2026-04-09 21:32:45.827 AGENT_STARTING internal_docs_agent b6000853c649fd03 84a19a6ed0f23b06 ← PHANTOM!
2026-04-09 21:32:45.829 LLM_REQUEST internal_docs_agent 70f56dd9f82afc16 b6000853c649fd03 ← OK
2026-04-09 21:32:49.502 LLM_RESPONSE internal_docs_agent 70f56dd9f82afc16 b6000853c649fd03 ← OK
2026-04-09 21:32:49.503 TOOL_STARTING internal_docs_agent 3dab867f3003b630 9b7b334a7c4d42ad ← PHANTOM!
2026-04-09 21:32:50.288 TOOL_COMPLETED internal_docs_agent 3dab867f3003b630 9b7b334a7c4d42ad ← PHANTOM!
2026-04-09 21:32:53.837 AGENT_COMPLETED internal_docs_agent b6000853c649fd03 84a19a6ed0f23b06 ← PHANTOM!
2026-04-09 21:32:53.838 AGENT_COMPLETED knowledge_supervisor 40fad0f4b5d8203c 351611af4c2a9ca5 ← OK
2026-04-09 21:32:53.838 INVOCATION_COMPL knowledge_supervisor 351611af4c2a9ca5 (null)
Parent span cross-reference:
parent_span_id=351611af4c2a9ca5 → EXISTS in BQ (invocation span)
parent_span_id=40fad0f4b5d8203c → EXISTS in BQ (agent span)
parent_span_id=b6000853c649fd03 → EXISTS in BQ (sub-agent span)
parent_span_id=84a19a6ed0f23b06 → PHANTOM (framework 'execute_tool' span — not in BQ)
parent_span_id=9b7b334a7c4d42ad → PHANTOM (framework 'execute_tool' span — not in BQ)
The phantom span 84a19a6ed0f23b06 is the ADK framework's execute_tool transfer_to_agent OTel span created at flows/llm_flows/functions.py:588. The phantom span 9b7b334a7c4d42ad is the framework's execute_tool search_internal_docs OTel span. Both exist in Cloud Trace but are never written to BQ.
event_type parent_status count
──────────────────────────────────────────────────────────
AGENT_COMPLETED phantom 131
AGENT_COMPLETED resolved 3446
AGENT_STARTING phantom 131
AGENT_STARTING resolved 3446
LLM_REQUEST resolved 3554
LLM_RESPONSE phantom 3299 ← 92.8% phantom!
LLM_RESPONSE resolved 255
TOOL_COMPLETED phantom 175 ← 10.3% phantom
TOOL_COMPLETED resolved 1520
TOOL_STARTING phantom 175
TOOL_STARTING resolved 1520
Environment Details:
- ADK Library Version:
google-adk 1.28.1
- Desktop OS: Linux (Ubuntu 24.04)
- Python Version: 3.11
Model Information:
- Are you using LiteLLM: No
- Which model is being used:
gemini-2.5-pro (via Vertex AI)
🟡 Optional Information
Regression:
Unknown — this may have existed since OTel support was added. The plugin comments reference issues #4561 and #4645 which drove the current _resolve_ids() layered approach.
Logs:
N/A — this is not a runtime error. The plugin runs without errors; it simply writes parent_span_id values that reference OTel spans not present in BQ.
Root Cause Analysis:
The issue is in _resolve_ids() (bigquery_agent_analytics_plugin.py, ~line 2520):
# --- Layer 2: ambient OTel span ---
ambient = trace.get_current_span()
ambient_ctx = ambient.get_span_context()
if ambient_ctx.is_valid:
trace_id = format(ambient_ctx.trace_id, "032x")
span_id = format(ambient_ctx.span_id, "016x") # ← takes ambient span_id
parent_span_id = None
parent_ctx = getattr(ambient, "parent", None)
if parent_ctx is not None and parent_ctx.span_id:
parent_span_id = format(parent_ctx.span_id, "016x") # ← takes ambient parent
When OTel instrumentation is active, the ADK framework wraps callbacks in internal spans:
| Framework Location |
start_as_current_span() Name |
Logged to BQ? |
runners.py:546 |
invocation |
No |
base_agent.py:288 |
invoke_agent {name} |
No |
base_llm_flow.py:1126 |
call_llm |
No |
functions.py:588 |
execute_tool {name} |
No |
base_llm_flow.py:505 |
send_data |
No |
When before_tool_callback fires, the ambient OTel span is execute_tool search_internal_docs. Layer 2 picks up this span's span_id and its parent's span_id. Since these framework spans are never written to BQ by the plugin, the resulting parent_span_id in the BQ row is a dangling reference.
The plugin's own internal span stack (Layer 3, via TraceManager) would produce correct, self-consistent parent_span_ids. But Layer 2 overrides Layer 3 whenever an ambient OTel span is present.
In after_tool_callback, the plugin explicitly decides NOT to override when ambient OTel is present:
has_ambient = trace.get_current_span().get_span_context().is_valid
event_data = EventData(
span_id_override=None if has_ambient else span_id, # ← None when OTel active
parent_span_id_override=None if has_ambient else parent_span_id, # ← None when OTel active
)
This means Layer 1 (explicit overrides) is intentionally skipped, and Layer 2 (ambient OTel) takes full control of span_id/parent_span_id.
Consequences:
-
Parent-child JOINs in BQ are broken. Any query like LEFT JOIN agent_events_view ON tool.parent_span_id = agent.span_id returns NULL for the agent columns. This makes it impossible to answer basic questions like "which agent was this tool running inside?" or "what was the agent's total latency for this tool call?" using only BigQuery data.
-
Execution tree reconstruction is impossible. Building a tree of agent → sub-agent → tool → LLM from the BQ data requires a connected parent-child graph. Phantom spans create disconnected subtrees.
-
BQ analytics and dashboards are degraded. Any dashboard that computes "tool latency as % of parent agent latency" or "agent error status for failed tool calls" gets NULL values for the vast majority of records.
-
Inconsistency between BQ and Cloud Trace. The same trace looks correct in Cloud Trace (all spans present) but broken in BQ (dangling parent references). Users expect the BQ data to be self-consistent.
Proposed Fix (in the BQ plugin):
The fix would be to always use the plugin's internal TraceManager stack for parent_span_id, even when ambient OTel is present. The ambient span should only influence trace_id (to maintain correlation with Cloud Trace) and optionally span_id, but parent_span_id should always come from the plugin's stack (Layer 3) since that's the only layer guaranteed to produce span_ids that are actually logged to BQ.
Conceptually:
# In _resolve_ids():
if ambient_ctx.is_valid:
trace_id = format(ambient_ctx.trace_id, "032x")
span_id = format(ambient_ctx.span_id, "016x")
# DON'T override parent_span_id from ambient — keep plugin stack value
# parent_span_id stays as plugin_parent_span_id from Layer 3
And in the after_*_callback methods, always pass the popped span's parent as an override:
# In after_tool_callback:
event_data = EventData(
span_id_override=None if has_ambient else span_id,
parent_span_id_override=parent_span_id, # ← ALWAYS override parent, not just when no ambient
)
This would keep trace_id and span_id aligned with Cloud Trace (for cross-referencing) while ensuring parent_span_id always forms a valid, self-consistent graph within BQ.
Workaround (SQL-level):
Until a plugin fix is available, consumers can JOIN on trace_id + agent_name instead of parent_span_id:
-- Instead of:
LEFT JOIN agent_events_view A ON T.parent_span_id = A.span_id
-- Use:
LEFT JOIN agent_events_view A ON T.trace_id = A.trace_id AND T.agent_name = A.agent_name
This works because both tool_events_view and llm_events_view carry an agent_name column (from the raw agent field) that identifies which agent the event belongs to.
Minimal Reproduction Code:
from google.adk.agents import Agent, LlmAgent
from google.adk.apps import App
from google.adk.models import Gemini
from google.adk.plugins.bigquery_agent_analytics_plugin import (
BigQueryLoggerConfig,
BigQueryAgentAnalyticsPlugin,
)
# A tool that the sub-agent will call
def search_docs(query: str) -> str:
"""Searches documents."""
return "No results found."
# Sub-agent with a tool
sub_agent = LlmAgent(
name="docs_agent",
model="gemini-2.5-flash",
description="Searches documents",
instruction="Use search_docs to answer questions.",
tools=[search_docs],
)
# Root supervisor agent
root = Agent(
name="supervisor",
model=Gemini(model="gemini-2.5-pro"),
description="Routes queries",
instruction="Route document questions to docs_agent.",
sub_agents=[sub_agent],
)
bq_plugin = BigQueryAgentAnalyticsPlugin(
project_id="your-project",
dataset_id="your-dataset",
table_id="your-table",
config=BigQueryLoggerConfig(enabled=True, batch_size=1),
location="us-central1",
)
app = App(root_agent=root, name="test_app", plugins=[bq_plugin])
# Run a query that triggers: supervisor → LLM → transfer → docs_agent → LLM → search_docs → LLM
# Then query BQ and check parent_span_id references
After running, execute the diagnostic SQL from "Steps to Reproduce" step 4. You will find TOOL_STARTING/COMPLETED events for search_docs with parent_span_id values that don't exist as any row's span_id in the table.
How often has this issue occurred?:
- Always (100%) — every trace with OTel instrumentation active exhibits phantom parent_span_ids. The exact percentage varies by event type (92.8% for LLM_RESPONSE, 10.3% for TOOL events, 3.7% for AGENT events in our dataset).
GitHub Issue: BigQuery Plugin
parent_span_idPoints to Unlogged OTel Spans🔴 Required Information
Describe the Bug:
When OpenTelemetry instrumentation is active (e.g.,
opentelemetry-instrumentation-google-genai), theBigQueryAgentAnalyticsPluginwritesparent_span_idvalues that reference framework-internal OTel spans (call_llm,execute_tool,send_data) which are never logged to the BigQuery table. This makes theparent_span_idcolumn unusable for building parent-child relationships within BigQuery — it points to spans that don't exist in the table.The consequence is that any SQL JOIN like
LEFT JOIN agent_events_view A ON T.parent_span_id = A.span_idreturns NULL for the agent context, making it impossible to correlate LLM/tool events with their enclosing agent within BigQuery alone.In our dataset, 92.8% of LLM_RESPONSE events and 10.3% of TOOL events have phantom
parent_span_idvalues.Steps to Reproduce:
opentelemetry-instrumentation-google-genai(or any package that activates OTel tracing)BigQueryAgentAnalyticsPluginand run the agentExpected Behavior:
Every
parent_span_idwritten to BigQuery should reference aspan_idthat also exists in the same BigQuery table, forming a self-consistent span tree. This would allow users to traverse the parent-child relationship to build execution trees, correlate tool/LLM latency with agent context, and construct dashboards that show the full execution hierarchy.Observed Behavior:
parent_span_idfrequently points to internal ADK framework spans that are only visible in Cloud Trace but never written to BigQuery. The BQ span graph has "dangling pointers" that break any parent-child JOIN.Concrete example from a single trace (
c7f3ed279e620d0e26df69f9165406e8):The phantom span
84a19a6ed0f23b06is the ADK framework'sexecute_tool transfer_to_agentOTel span created atflows/llm_flows/functions.py:588. The phantom span9b7b334a7c4d42adis the framework'sexecute_tool search_internal_docsOTel span. Both exist in Cloud Trace but are never written to BQ.Environment Details:
google-adk1.28.1Model Information:
gemini-2.5-pro(via Vertex AI)🟡 Optional Information
Regression:
Unknown — this may have existed since OTel support was added. The plugin comments reference issues #4561 and #4645 which drove the current
_resolve_ids()layered approach.Logs:
N/A — this is not a runtime error. The plugin runs without errors; it simply writes
parent_span_idvalues that reference OTel spans not present in BQ.Root Cause Analysis:
The issue is in
_resolve_ids()(bigquery_agent_analytics_plugin.py, ~line 2520):When OTel instrumentation is active, the ADK framework wraps callbacks in internal spans:
start_as_current_span()Namerunners.py:546invocationbase_agent.py:288invoke_agent {name}base_llm_flow.py:1126call_llmfunctions.py:588execute_tool {name}base_llm_flow.py:505send_dataWhen
before_tool_callbackfires, the ambient OTel span isexecute_tool search_internal_docs. Layer 2 picks up this span'sspan_idand its parent'sspan_id. Since these framework spans are never written to BQ by the plugin, the resultingparent_span_idin the BQ row is a dangling reference.The plugin's own internal span stack (Layer 3, via
TraceManager) would produce correct, self-consistent parent_span_ids. But Layer 2 overrides Layer 3 whenever an ambient OTel span is present.In
after_tool_callback, the plugin explicitly decides NOT to override when ambient OTel is present:This means Layer 1 (explicit overrides) is intentionally skipped, and Layer 2 (ambient OTel) takes full control of span_id/parent_span_id.
Consequences:
Parent-child JOINs in BQ are broken. Any query like
LEFT JOIN agent_events_view ON tool.parent_span_id = agent.span_idreturns NULL for the agent columns. This makes it impossible to answer basic questions like "which agent was this tool running inside?" or "what was the agent's total latency for this tool call?" using only BigQuery data.Execution tree reconstruction is impossible. Building a tree of agent → sub-agent → tool → LLM from the BQ data requires a connected parent-child graph. Phantom spans create disconnected subtrees.
BQ analytics and dashboards are degraded. Any dashboard that computes "tool latency as % of parent agent latency" or "agent error status for failed tool calls" gets NULL values for the vast majority of records.
Inconsistency between BQ and Cloud Trace. The same trace looks correct in Cloud Trace (all spans present) but broken in BQ (dangling parent references). Users expect the BQ data to be self-consistent.
Proposed Fix (in the BQ plugin):
The fix would be to always use the plugin's internal TraceManager stack for
parent_span_id, even when ambient OTel is present. The ambient span should only influencetrace_id(to maintain correlation with Cloud Trace) and optionallyspan_id, butparent_span_idshould always come from the plugin's stack (Layer 3) since that's the only layer guaranteed to produce span_ids that are actually logged to BQ.Conceptually:
And in the
after_*_callbackmethods, always pass the popped span's parent as an override:This would keep
trace_idandspan_idaligned with Cloud Trace (for cross-referencing) while ensuringparent_span_idalways forms a valid, self-consistent graph within BQ.Workaround (SQL-level):
Until a plugin fix is available, consumers can JOIN on
trace_id + agent_nameinstead ofparent_span_id:This works because both
tool_events_viewandllm_events_viewcarry anagent_namecolumn (from the rawagentfield) that identifies which agent the event belongs to.Minimal Reproduction Code:
After running, execute the diagnostic SQL from "Steps to Reproduce" step 4. You will find TOOL_STARTING/COMPLETED events for
search_docswithparent_span_idvalues that don't exist as any row'sspan_idin the table.How often has this issue occurred?: