Skip to content

Shrink oversized images client-side before submission#435

Merged
timmarkhuff merged 17 commits into
mainfrom
tim/client-side-image-shrinking
May 18, 2026
Merged

Shrink oversized images client-side before submission#435
timmarkhuff merged 17 commits into
mainfrom
tim/client-side-image-shrinking

Conversation

@timmarkhuff
Copy link
Copy Markdown
Contributor

@timmarkhuff timmarkhuff commented May 18, 2026

Summary

The Groundlight cloud service downscales and re-encodes oversized images on
ingest. This PR copies that same step to the client, running it before bytes leave the machine.

Two benefits:

  • Edge Endpoint quality: edge ML models were trained on cloud-processed
    images. Submitting large images directly to an Edge Endpoint skips that
    step, causing a distribution shift that can hurt confidence.
  • Bandwidth: oversized images are shrunk before transmission to either
    the cloud or edge.

The cloud service's own shrink path remains in place as a safety net for
direct API calls and other-language clients.

Trade-offs

  • Code duplication: the algorithm now exists in both this SDK and the
    cloud service. We accept this; the alternative (a shared library) is
    premature for one small function. A note in the source points at the cloud
    service as the canonical owner of the defaults.
  • Per-detector overrides invisible to the SDK: the cloud service allows
    per-detector limits looser than the defaults. Users with such overrides
    will have the SDK shrink more aggressively than necessary. This is an
    acceptable limitation for v1; an opt-out can be added if a real need
    surfaces.
  • Small CPU cost on the client: negligible for images that are already
    small (fast path exits immediately on byte length check). For large images,
    the savings in transmission time outweigh it.

Tests

  • Algorithm unit tests: lock the shrink logic itself against accidental
    changes, covering all three cases (below threshold, resize, re-encode only).
  • Wiring unit test: mocks the urllib3 transport layer and inspects the
    raw request body, asserting the image is already shrunk before it goes on
    the wire. Guards against the shrink step being accidentally removed from
    the submission path (the cloud service's own fallback means the integration
    test below would still pass without this guard).
  • Integration test: submits a known oversized image to a real detector,
    fetches it back, and asserts the stored dimensions match what the SDK
    produces locally. Catches the cloud service diverging from the SDK's
    algorithm.

@timmarkhuff timmarkhuff requested a review from brandon-wada May 18, 2026 05:12
Comment thread src/groundlight/images.py
"""
if len(jpeg) <= MAX_BYTES_IMAGE_SIZE:
return jpeg
img = Image.open(BytesIO(jpeg)).convert("RGB")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to break our optional pillow dependency with this

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would need to check the rest of the code, but I think we should be able to scale the image before it becomes a bytestream?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that PIL has ever been optional. I think if it becomes optional, we could just turn off image pre-processing on the client side if PIL is not installed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for scaling the image before it becomes a bytestream, it seems that would be possible, but only for inputs that are already PIL objects. Claude points out that for string/bytes inputs (e.g. reading a file off disk), there's no PIL object in flight, so we'd still need the post-conversion step as a fallback anyway. Two code paths for marginal gain didn't seem worth it.

@timmarkhuff timmarkhuff merged commit 995e6da into main May 18, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants