Skip to content

BigQueryAgentAnalyticsPlugin: parent_span_id Points to Unlogged OTel Spans #5310

@evekhm

Description

@evekhm

GitHub Issue: BigQuery Plugin parent_span_id Points to Unlogged OTel Spans

🔴 Required Information

Describe the Bug:

When OpenTelemetry instrumentation is active (e.g., opentelemetry-instrumentation-google-genai), the BigQueryAgentAnalyticsPlugin writes parent_span_id values that reference framework-internal OTel spans (call_llm, execute_tool, send_data) which are never logged to the BigQuery table. This makes the parent_span_id column unusable for building parent-child relationships within BigQuery — it points to spans that don't exist in the table.

The consequence is that any SQL JOIN like LEFT JOIN agent_events_view A ON T.parent_span_id = A.span_id returns NULL for the agent context, making it impossible to correlate LLM/tool events with their enclosing agent within BigQuery alone.

In our dataset, 92.8% of LLM_RESPONSE events and 10.3% of TOOL events have phantom parent_span_id values.

Steps to Reproduce:

  1. Create an ADK agent with sub-agents and tools (see minimal reproduction below)
  2. Install opentelemetry-instrumentation-google-genai (or any package that activates OTel tracing)
  3. Configure the BigQueryAgentAnalyticsPlugin and run the agent
  4. Query the BQ table to find parent_span_ids that don't exist as span_ids:
-- Find "phantom" parent_span_ids (referenced but never logged)
WITH logged_spans AS (
  SELECT DISTINCT span_id FROM `project.dataset.table`
),
parent_refs AS (
  SELECT DISTINCT parent_span_id 
  FROM `project.dataset.table`
  WHERE parent_span_id IS NOT NULL
)
SELECT p.parent_span_id, 
  CASE WHEN s.span_id IS NOT NULL THEN 'EXISTS' ELSE 'PHANTOM' END as status
FROM parent_refs p
LEFT JOIN logged_spans s ON p.parent_span_id = s.span_id
WHERE s.span_id IS NULL;
-- Returns dozens/hundreds of phantom span_ids
  1. Attempt to JOIN tool or LLM events to their parent agent:
SELECT T.tool_name, T.parent_span_id, A.agent_name, A.duration_ms
FROM tool_events_view T
LEFT JOIN agent_events_view A ON T.parent_span_id = A.span_id
-- A.agent_name and A.duration_ms are NULL for all phantom cases

Expected Behavior:

Every parent_span_id written to BigQuery should reference a span_id that also exists in the same BigQuery table, forming a self-consistent span tree. This would allow users to traverse the parent-child relationship to build execution trees, correlate tool/LLM latency with agent context, and construct dashboards that show the full execution hierarchy.

Observed Behavior:

parent_span_id frequently points to internal ADK framework spans that are only visible in Cloud Trace but never written to BigQuery. The BQ span graph has "dangling pointers" that break any parent-child JOIN.

Concrete example from a single trace (c7f3ed279e620d0e26df69f9165406e8):

Trace raw events in BigQuery:
timestamp                  event_type         agent                  span_id              parent_span_id
──────────────────────────────────────────────────────────────────────────────────────────────────────────
2026-04-09 21:32:40.762    USER_MSG_RECEIVED  knowledge_supervisor   351611af4c2a9ca5     (null)
2026-04-09 21:32:40.764    INVOCATION_START   knowledge_supervisor   351611af4c2a9ca5     (null)
2026-04-09 21:32:40.765    AGENT_STARTING     knowledge_supervisor   40fad0f4b5d8203c     351611af4c2a9ca5  ← OK
2026-04-09 21:32:40.766    LLM_REQUEST        knowledge_supervisor   5179bd626c6d4fb9     40fad0f4b5d8203c  ← OK
2026-04-09 21:32:45.825    LLM_RESPONSE       knowledge_supervisor   5179bd626c6d4fb9     40fad0f4b5d8203c  ← OK
2026-04-09 21:32:45.826    TOOL_STARTING      knowledge_supervisor   fb9761a38d9251ba     84a19a6ed0f23b06  ← PHANTOM!
2026-04-09 21:32:45.826    TOOL_COMPLETED     knowledge_supervisor   fb9761a38d9251ba     84a19a6ed0f23b06  ← PHANTOM!
2026-04-09 21:32:45.827    AGENT_STARTING     internal_docs_agent    b6000853c649fd03     84a19a6ed0f23b06  ← PHANTOM!
2026-04-09 21:32:45.829    LLM_REQUEST        internal_docs_agent    70f56dd9f82afc16     b6000853c649fd03  ← OK
2026-04-09 21:32:49.502    LLM_RESPONSE       internal_docs_agent    70f56dd9f82afc16     b6000853c649fd03  ← OK
2026-04-09 21:32:49.503    TOOL_STARTING      internal_docs_agent    3dab867f3003b630     9b7b334a7c4d42ad  ← PHANTOM!
2026-04-09 21:32:50.288    TOOL_COMPLETED     internal_docs_agent    3dab867f3003b630     9b7b334a7c4d42ad  ← PHANTOM!
2026-04-09 21:32:53.837    AGENT_COMPLETED    internal_docs_agent    b6000853c649fd03     84a19a6ed0f23b06  ← PHANTOM!
2026-04-09 21:32:53.838    AGENT_COMPLETED    knowledge_supervisor   40fad0f4b5d8203c     351611af4c2a9ca5  ← OK
2026-04-09 21:32:53.838    INVOCATION_COMPL   knowledge_supervisor   351611af4c2a9ca5     (null)

Parent span cross-reference:
  parent_span_id=351611af4c2a9ca5  →  EXISTS in BQ  (invocation span)
  parent_span_id=40fad0f4b5d8203c  →  EXISTS in BQ  (agent span)
  parent_span_id=b6000853c649fd03  →  EXISTS in BQ  (sub-agent span)
  parent_span_id=84a19a6ed0f23b06  →  PHANTOM       (framework 'execute_tool' span — not in BQ)
  parent_span_id=9b7b334a7c4d42ad  →  PHANTOM       (framework 'execute_tool' span — not in BQ)

The phantom span 84a19a6ed0f23b06 is the ADK framework's execute_tool transfer_to_agent OTel span created at flows/llm_flows/functions.py:588. The phantom span 9b7b334a7c4d42ad is the framework's execute_tool search_internal_docs OTel span. Both exist in Cloud Trace but are never written to BQ.

event_type                     parent_status      count
──────────────────────────────────────────────────────────
AGENT_COMPLETED                phantom              131
AGENT_COMPLETED                resolved            3446
AGENT_STARTING                 phantom              131
AGENT_STARTING                 resolved            3446
LLM_REQUEST                    resolved            3554
LLM_RESPONSE                   phantom             3299   ← 92.8% phantom!
LLM_RESPONSE                   resolved             255
TOOL_COMPLETED                 phantom              175   ← 10.3% phantom
TOOL_COMPLETED                 resolved            1520
TOOL_STARTING                  phantom              175
TOOL_STARTING                  resolved            1520

Environment Details:

  • ADK Library Version: google-adk 1.28.1
  • Desktop OS: Linux (Ubuntu 24.04)
  • Python Version: 3.11

Model Information:

  • Are you using LiteLLM: No
  • Which model is being used: gemini-2.5-pro (via Vertex AI)

🟡 Optional Information

Regression:
Unknown — this may have existed since OTel support was added. The plugin comments reference issues #4561 and #4645 which drove the current _resolve_ids() layered approach.

Logs:
N/A — this is not a runtime error. The plugin runs without errors; it simply writes parent_span_id values that reference OTel spans not present in BQ.

Root Cause Analysis:

The issue is in _resolve_ids() (bigquery_agent_analytics_plugin.py, ~line 2520):

# --- Layer 2: ambient OTel span ---
ambient = trace.get_current_span()
ambient_ctx = ambient.get_span_context()
if ambient_ctx.is_valid:
    trace_id = format(ambient_ctx.trace_id, "032x")
    span_id = format(ambient_ctx.span_id, "016x")        # ← takes ambient span_id
    parent_span_id = None
    parent_ctx = getattr(ambient, "parent", None)
    if parent_ctx is not None and parent_ctx.span_id:
        parent_span_id = format(parent_ctx.span_id, "016x")  # ← takes ambient parent

When OTel instrumentation is active, the ADK framework wraps callbacks in internal spans:

Framework Location start_as_current_span() Name Logged to BQ?
runners.py:546 invocation No
base_agent.py:288 invoke_agent {name} No
base_llm_flow.py:1126 call_llm No
functions.py:588 execute_tool {name} No
base_llm_flow.py:505 send_data No

When before_tool_callback fires, the ambient OTel span is execute_tool search_internal_docs. Layer 2 picks up this span's span_id and its parent's span_id. Since these framework spans are never written to BQ by the plugin, the resulting parent_span_id in the BQ row is a dangling reference.

The plugin's own internal span stack (Layer 3, via TraceManager) would produce correct, self-consistent parent_span_ids. But Layer 2 overrides Layer 3 whenever an ambient OTel span is present.

In after_tool_callback, the plugin explicitly decides NOT to override when ambient OTel is present:

has_ambient = trace.get_current_span().get_span_context().is_valid
event_data = EventData(
    span_id_override=None if has_ambient else span_id,           # ← None when OTel active
    parent_span_id_override=None if has_ambient else parent_span_id,  # ← None when OTel active
)

This means Layer 1 (explicit overrides) is intentionally skipped, and Layer 2 (ambient OTel) takes full control of span_id/parent_span_id.

Consequences:

  1. Parent-child JOINs in BQ are broken. Any query like LEFT JOIN agent_events_view ON tool.parent_span_id = agent.span_id returns NULL for the agent columns. This makes it impossible to answer basic questions like "which agent was this tool running inside?" or "what was the agent's total latency for this tool call?" using only BigQuery data.

  2. Execution tree reconstruction is impossible. Building a tree of agent → sub-agent → tool → LLM from the BQ data requires a connected parent-child graph. Phantom spans create disconnected subtrees.

  3. BQ analytics and dashboards are degraded. Any dashboard that computes "tool latency as % of parent agent latency" or "agent error status for failed tool calls" gets NULL values for the vast majority of records.

  4. Inconsistency between BQ and Cloud Trace. The same trace looks correct in Cloud Trace (all spans present) but broken in BQ (dangling parent references). Users expect the BQ data to be self-consistent.

Proposed Fix (in the BQ plugin):

The fix would be to always use the plugin's internal TraceManager stack for parent_span_id, even when ambient OTel is present. The ambient span should only influence trace_id (to maintain correlation with Cloud Trace) and optionally span_id, but parent_span_id should always come from the plugin's stack (Layer 3) since that's the only layer guaranteed to produce span_ids that are actually logged to BQ.

Conceptually:

# In _resolve_ids():
if ambient_ctx.is_valid:
    trace_id = format(ambient_ctx.trace_id, "032x")
    span_id = format(ambient_ctx.span_id, "016x")
    # DON'T override parent_span_id from ambient — keep plugin stack value
    # parent_span_id stays as plugin_parent_span_id from Layer 3

And in the after_*_callback methods, always pass the popped span's parent as an override:

# In after_tool_callback:
event_data = EventData(
    span_id_override=None if has_ambient else span_id,
    parent_span_id_override=parent_span_id,  # ← ALWAYS override parent, not just when no ambient
)

This would keep trace_id and span_id aligned with Cloud Trace (for cross-referencing) while ensuring parent_span_id always forms a valid, self-consistent graph within BQ.

Workaround (SQL-level):

Until a plugin fix is available, consumers can JOIN on trace_id + agent_name instead of parent_span_id:

-- Instead of:
LEFT JOIN agent_events_view A ON T.parent_span_id = A.span_id

-- Use:
LEFT JOIN agent_events_view A ON T.trace_id = A.trace_id AND T.agent_name = A.agent_name

This works because both tool_events_view and llm_events_view carry an agent_name column (from the raw agent field) that identifies which agent the event belongs to.

Minimal Reproduction Code:

from google.adk.agents import Agent, LlmAgent
from google.adk.apps import App
from google.adk.models import Gemini
from google.adk.plugins.bigquery_agent_analytics_plugin import (
    BigQueryLoggerConfig,
    BigQueryAgentAnalyticsPlugin,
)

# A tool that the sub-agent will call
def search_docs(query: str) -> str:
    """Searches documents."""
    return "No results found."

# Sub-agent with a tool
sub_agent = LlmAgent(
    name="docs_agent",
    model="gemini-2.5-flash",
    description="Searches documents",
    instruction="Use search_docs to answer questions.",
    tools=[search_docs],
)

# Root supervisor agent
root = Agent(
    name="supervisor",
    model=Gemini(model="gemini-2.5-pro"),
    description="Routes queries",
    instruction="Route document questions to docs_agent.",
    sub_agents=[sub_agent],
)

bq_plugin = BigQueryAgentAnalyticsPlugin(
    project_id="your-project",
    dataset_id="your-dataset",
    table_id="your-table",
    config=BigQueryLoggerConfig(enabled=True, batch_size=1),
    location="us-central1",
)

app = App(root_agent=root, name="test_app", plugins=[bq_plugin])

# Run a query that triggers: supervisor → LLM → transfer → docs_agent → LLM → search_docs → LLM
# Then query BQ and check parent_span_id references

After running, execute the diagnostic SQL from "Steps to Reproduce" step 4. You will find TOOL_STARTING/COMPLETED events for search_docs with parent_span_id values that don't exist as any row's span_id in the table.

How often has this issue occurred?:

  • Always (100%) — every trace with OTel instrumentation active exhibits phantom parent_span_ids. The exact percentage varies by event type (92.8% for LLM_RESPONSE, 10.3% for TOOL events, 3.7% for AGENT events in our dataset).

Metadata

Metadata

Labels

tracing[Component] This issue is related to OpenTelemetry tracing

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions