Shrink oversized images client-side before submission#435
Conversation
…ght/python-sdk into tim/client-side-image-shrinking
…ght/python-sdk into tim/client-side-image-shrinking
| """ | ||
| if len(jpeg) <= MAX_BYTES_IMAGE_SIZE: | ||
| return jpeg | ||
| img = Image.open(BytesIO(jpeg)).convert("RGB") |
There was a problem hiding this comment.
We're going to break our optional pillow dependency with this
There was a problem hiding this comment.
Also, would need to check the rest of the code, but I think we should be able to scale the image before it becomes a bytestream?
There was a problem hiding this comment.
I don't see that PIL has ever been optional. I think if it becomes optional, we could just turn off image pre-processing on the client side if PIL is not installed.
There was a problem hiding this comment.
As for scaling the image before it becomes a bytestream, it seems that would be possible, but only for inputs that are already PIL objects. Claude points out that for string/bytes inputs (e.g. reading a file off disk), there's no PIL object in flight, so we'd still need the post-conversion step as a fallback anyway. Two code paths for marginal gain didn't seem worth it.
Summary
The Groundlight cloud service downscales and re-encodes oversized images on
ingest. This PR copies that same step to the client, running it before bytes leave the machine.
Two benefits:
images. Submitting large images directly to an Edge Endpoint skips that
step, causing a distribution shift that can hurt confidence.
the cloud or edge.
The cloud service's own shrink path remains in place as a safety net for
direct API calls and other-language clients.
Trade-offs
cloud service. We accept this; the alternative (a shared library) is
premature for one small function. A note in the source points at the cloud
service as the canonical owner of the defaults.
per-detector limits looser than the defaults. Users with such overrides
will have the SDK shrink more aggressively than necessary. This is an
acceptable limitation for v1; an opt-out can be added if a real need
surfaces.
small (fast path exits immediately on byte length check). For large images,
the savings in transmission time outweigh it.
Tests
changes, covering all three cases (below threshold, resize, re-encode only).
raw request body, asserting the image is already shrunk before it goes on
the wire. Guards against the shrink step being accidentally removed from
the submission path (the cloud service's own fallback means the integration
test below would still pass without this guard).
fetches it back, and asserts the stored dimensions match what the SDK
produces locally. Catches the cloud service diverging from the SDK's
algorithm.