fix: retry transient GCS delete_many failures#90
Conversation
📝 WalkthroughWalkthrough
ChangesDelete Many Retry Support
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
multi-storage-client/src/multistorageclient/providers/gcs.py (1)
527-536: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winDon't let a transient response hide a permanent failure in the same batch.
This loop raises on the first failing response. If a batch comes back as
503then403, Line 531 turns the whole operation into a retryable failure and the permanent403is never surfaced. That can waste retries and return the wrong final error. Collect all non-2xx/404 responses first, raiseRuntimeErrorif any non-retryable status is present, and only raiseRetryableErrorwhen every failure in the batch is transient.Suggested fix
- for response in batch._responses: - status_code = response.status_code - if 200 <= status_code < 300 or status_code == 404: - continue - if status_code in {408, 429} or 500 <= status_code < 600: - raise RetryableError( - f"GCS batch delete failed with status_code: {status_code}, response: {response.text}" - ) - raise RuntimeError( - f"GCS batch delete failed with status_code: {status_code}, response: {response.text}" - ) + retryable_failures: list[str] = [] + non_retryable_failures: list[str] = [] + for response in batch._responses: + status_code = response.status_code + if 200 <= status_code < 300 or status_code == 404: + continue + + message = ( + f"GCS batch delete failed with status_code: {status_code}, " + f"response: {response.text}" + ) + if status_code in {408, 429} or 500 <= status_code < 600: + retryable_failures.append(message) + else: + non_retryable_failures.append(message) + + if non_retryable_failures: + raise RuntimeError("; ".join(non_retryable_failures)) + if retryable_failures: + raise RetryableError("; ".join(retryable_failures))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@multi-storage-client/src/multistorageclient/providers/gcs.py` around lines 527 - 536, The batch delete handling in the GCS provider currently exits on the first failing response inside the batch loop, which can hide a permanent failure behind a transient one. Update the response processing in the batch delete logic to collect all non-2xx/non-404 statuses from batch._responses first, then have the GCS batch delete path raise RuntimeError if any non-retryable status is present, and only raise RetryableError when every failure is transient. Keep the change localized to the batch response handling in the GCS provider’s delete flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@multi-storage-client/src/multistorageclient/providers/gcs.py`:
- Around line 527-536: The batch delete handling in the GCS provider currently
exits on the first failing response inside the batch loop, which can hide a
permanent failure behind a transient one. Update the response processing in the
batch delete logic to collect all non-2xx/non-404 statuses from batch._responses
first, then have the GCS batch delete path raise RuntimeError if any
non-retryable status is present, and only raise RetryableError when every
failure is transient. Keep the change localized to the batch response handling
in the GCS provider’s delete flow.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0d1d9bfd-3c1a-41c0-b2f8-1b48b4c088f7
📒 Files selected for processing (3)
multi-storage-client/src/multistorageclient/client/single.pymulti-storage-client/src/multistorageclient/providers/gcs.pymulti-storage-client/tests/test_multistorageclient/unit/test_retry.py
|
/ok to test 6b50798 |
Description
delete_manyfailures during batch deletes.408,429, and5xxresponse statuses.delete_manyretry behavior.Changes
@retrytoSingleStorageClient.delete_many.RetryableErrorfor transient response statuses.delete_manyretry and GCS batch delete503handling.Closes NGCDP-8087
Checklist
.release_notes/.unreleased.mdmulti-storage-client/pyproject.toml.release_notes/.unreleased.md.release_notes/{bumped package version}.mdfile.Summary by CodeRabbit
Bug Fixes
Tests