fix(python-client): harden tracing, uploads, and execution response handling by wllbo · Pull Request #501 · kubernetes-sigs/agent-sandbox

wllbo · 2026-04-01T23:21:28Z

Issues surfaced during Go SDK review (#424) that also apply to Python SDK.

Replace initialize_tracer() with create_tracer_provider() factory, the SDK no longer calls trace.set_tracer_provider().
Raise ValueError on upload paths with directory separators instead of silently stripping to basename.
Reject execution responses larger than 16 MB with a clear error.
Use generateName for claim creation instead of manual uuid.uuid4() suffix.

netlify · 2026-04-01T23:21:35Z

✅ Deploy Preview for agent-sandbox canceled.

Name	Link
🔨 Latest commit	`eddf245`
🔍 Latest deploy log	https://app.netlify.com/projects/agent-sandbox/deploys/69d9740cd02fc400087368be

k8s-ci-robot · 2026-04-01T23:21:38Z

Hi @wllbo. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

aditya-shantanu · 2026-04-02T15:56:28Z

/assign @SHRUTI6991

SHRUTI6991 · 2026-04-02T16:17:30Z

 from k8s_agent_sandbox.trace_manager import trace_span, trace

+# Maximum response size for command execution (16 MB).
+MAX_EXECUTION_RESPONSE_SIZE = 16 * 1024 * 1024


Why are we enforcing this limit?

It guards against the client parsing and propagating unexpectedly large command output. 16 MB matches the Go SDK limit from #424. For larger outputs, callers should use file I/O instead.

SHRUTI6991 · 2026-04-02T16:19:44Z


    @trace_span("run")
    def run(self, command: str, timeout: int = 60) -> ExecutionResult:
+        """Executes a command. Rejects responses larger than 16 MB."""


Nit: You may be wanna parametrize this based on the MAX_EXECUTION_RESPONSE_SIZE value.

updated, docstring now references the constant by name

SHRUTI6991 · 2026-04-02T16:21:26Z


 class SandboxTracerConfig(BaseModel):
    """Configuration for tracer level information"""
+    model_config = {"arbitrary_types_allowed": True}


Nit: add a comment about this field.

added a comment explaining it's needed for the tracer_provider field

SHRUTI6991 · 2026-04-02T16:22:02Z

-
        try:
-            self._create_claim(claim_name, template, namespace)
+            claim_name = self._create_claim(template, namespace)


Nice change!

SHRUTI6991 · 2026-04-02T16:22:55Z

-    Initializes the global OpenTelemetry TracerProvider using the singleton pattern.
-
-    This function uses double-checked locking to ensure thread-safe, one-time initialization.
+def create_tracer_provider(service_name: str) -> "TracerProvider | None":


@igooch can you review this change as well?

SHRUTI6991 · 2026-04-02T16:28:26Z

+    provider.add_span_processor(
+        BatchSpanProcessor(OTLPSpanExporter())
+    )
+    atexit.register(provider.shutdown)


The trace provider is shutdown when the client goes out of scope right?

Not on client GC, atexit fires at process exit. Python's __del__ is unreliable, so atexit ensures BatchSpanProcessor flushes remaining spans before the process terminates.

If the caller passes their own provider via SandboxTracerConfig.tracer_provider, create_tracer_provider is never called, so we don't register anything. They own the lifecycle.

aditya-shantanu · 2026-04-02T22:56:13Z

/ok-to-test

k8s-ci-robot · 2026-04-10T21:36:10Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aditya-shantanu, wllbo
Once this PR has been reviewed and has the lgtm label, please ask for approval from shruti6991. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

clients/python/agentic-sandbox-client/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2026-04-14T21:46:20Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

SHRUTI6991 · 2026-04-15T16:13:34Z

/lgtm

/assign @igooch

Copilot

Pull request overview

Hardens the Python SDK’s tracing setup, claim creation, upload path validation, and execution response handling to align with issues found during the Go SDK review.

Changes:

Replaces global initialize_tracer() with a caller-owned create_tracer_provider() and threads an optional provider into tracer creation.
Switches SandboxClaim creation to Kubernetes generateName and updates sync/async clients/helpers accordingly.
Adds stricter upload filename validation and rejects oversized execution responses.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
clients/python/agentic-sandbox-client/k8s_agent_sandbox/trace_manager.py	Introduces `create_tracer_provider()` and updates tracer manager creation to accept an optional provider.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/sandbox_client.py	Updates claim creation flow to use generated claim names returned from K8s.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.py	Async equivalent of claim-name generation and tracing provider wiring.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/k8s_helper.py	Uses `generateName` for claim creation and returns the generated claim name.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_k8s_helper.py	Async equivalent of `generateName` claim creation and returning generated names.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/sandbox.py	Passes the configured tracer provider into tracer manager creation.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/models.py	Allows arbitrary tracer provider types in `SandboxTracerConfig`.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/files/filesystem.py	Rejects upload paths containing directories (sync filesystem).
clients/python/agentic-sandbox-client/k8s_agent_sandbox/commands/command_executor.py	Adds a max-size guard for execution responses (sync executor).
clients/python/agentic-sandbox-client/k8s_agent_sandbox/test/unit/test_sandboxclient.py	Updates tests for generated claim names and updated `_create_claim` signature/return.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/test/unit/test_async_sandboxclient.py	Updates async client tests for generated claim names.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/test/unit/test_k8s_helper.py	Updates helper tests for new `create_sandbox_claim` signature.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/test/unit/test_async_k8s_helper.py	Updates async helper tests for new `create_sandbox_claim` signature/return.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/test/unit/test_sandbox.py	Updates expectations for `create_tracer_manager(..., None)` invocation.

Copilot · 2026-04-21T17:06:38Z

-
        try:
-            await self._create_claim(claim_name, template, namespace, labels=labels, lifecycle=lifecycle)
+            claim_name = await self._create_claim(template, namespace, labels=labels, lifecycle=lifecycle)


claim_name is assigned inside this try; if _create_claim(...) fails before returning a name, the except cleanup path later will reference an unassigned claim_name (leading to UnboundLocalError, and potentially skipping cleanup logic). Initialize claim_name before the try and guard the cleanup call so it only runs when a claim was actually created.

Copilot · 2026-04-21T17:06:38Z

        return None, None

-    manager = TracerManager(service_name=config.trace_service_name)
+    manager = TracerManager(service_name=config.trace_service_name, provider=provider)


create_tracer_manager ignores config.tracer_provider unless callers thread it through explicitly. This already results in at least one call site (AsyncSandbox) not using the provider even when set on the config. Consider defaulting to config.tracer_provider when provider is None to avoid silent no-op tracing and to keep older call patterns working.

Suggested change

manager = TracerManager(service_name=config.trace_service_name, provider=provider)

effective_provider = provider

if effective_provider is None:

effective_provider = getattr(config, "tracer_provider", None)

manager = TracerManager(

service_name=config.trace_service_name,

provider=effective_provider,

)

Copilot · 2026-04-21T17:06:39Z

        filename = os.path.basename(path)
+        if filename != path:
+            raise ValueError(
+                f"path must be a plain filename without directories, got {path!r}"
+            )


This validation only rejects separators recognized by os.path.basename on the current OS. On POSIX, a path like "dir\\file" will pass even though it contains a directory separator for Windows-style paths. If the goal is to reject any directory separators, explicitly check for both '/' and '\\' (and consider updating AsyncFilesystem.write to apply the same rule for parity).

Copilot · 2026-04-21T17:06:39Z

+        body = response.content
+        if len(body) > MAX_EXECUTION_RESPONSE_SIZE:
+            raise RuntimeError(
+                f"Execution response exceeds {MAX_EXECUTION_RESPONSE_SIZE} byte limit"
+            )


This size check happens after requests has already buffered the full response body (since send_request doesn’t use streaming), so it won’t protect the client from large memory/network usage—only from parsing/processing. Consider enforcing the limit via Content-Length pre-check and/or streaming reads with an explicit cap (and mirror the same limit in AsyncCommandExecutor, which currently has no equivalent guard).

Copilot · 2026-04-21T17:06:39Z

-
        try:
-            self._create_claim(claim_name, template, namespace, labels=labels, lifecycle=lifecycle)
+            claim_name = self._create_claim(template, namespace, labels=labels, lifecycle=lifecycle)


Because claim_name is now assigned inside the try, if _create_claim(...) raises, the except block later will reference an unassigned claim_name (raising UnboundLocalError and masking the original exception). Initialize claim_name before the try (e.g., None) and only attempt cleanup when it was successfully created.

k8s-ci-robot requested review from justinsb and vicentefb April 1, 2026 23:21

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 1, 2026

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 1, 2026

aditya-shantanu approved these changes Apr 2, 2026

View reviewed changes

k8s-ci-robot assigned SHRUTI6991 Apr 2, 2026

SHRUTI6991 reviewed Apr 2, 2026

View reviewed changes

justinsb added area:rich-client labels Apr 2, 2026

fix(python-client): address issues surfaced during Go SDK review

63e1182

wllbo force-pushed the python-sdk-fixes branch from 6296f2e to f8f171b Compare April 10, 2026 21:36

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 10, 2026

fix(python-client): improve docstring and config clarity

eddf245

wllbo force-pushed the python-sdk-fixes branch from f8f171b to eddf245 Compare April 10, 2026 22:04

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 14, 2026

k8s-ci-robot assigned igooch Apr 15, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 15, 2026

janetkuo requested a review from Copilot April 21, 2026 17:00

Copilot started reviewing on behalf of janetkuo April 21, 2026 17:00 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

janetkuo added the action-required: resolve-copilot-comments label Apr 21, 2026

-    manager = TracerManager(service_name=config.trace_service_name, provider=provider)
+    effective_provider = provider
+    if effective_provider is None:
+        effective_provider = getattr(config, "tracer_provider", None)
+    manager = TracerManager(
+        service_name=config.trace_service_name,
+        provider=effective_provider,
+    )

Conversation

wllbo commented Apr 1, 2026

Uh oh!

netlify Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for agent-sandbox canceled.

Uh oh!

k8s-ci-robot commented Apr 1, 2026

Uh oh!

aditya-shantanu commented Apr 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aditya-shantanu commented Apr 2, 2026

Uh oh!

k8s-ci-robot commented Apr 10, 2026

Uh oh!

k8s-ci-robot commented Apr 14, 2026

Uh oh!

SHRUTI6991 commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

netlify Bot commented Apr 1, 2026 •

edited

Loading