fix: align datacenter enum with valid real-world targets#336
Conversation
|
Promptless prepared a documentation update related to this change. Triggered by runpod/flash PR #336 Updated datacenter reference tables and code examples to reflect the removal of four datacenters that lack storage + S3 support ( |
There was a problem hiding this comment.
Pull request overview
This PR narrows the DataCenter enum to supported storage/S3 locations, moves it into a dedicated resource module, and updates related imports/tests.
Changes:
- Adds
core/resources/datacenter.pyforDataCenterandCPU_DATACENTERS. - Updates resource imports to use the new datacenter module.
- Refreshes tests to use supported datacenter values and validate new endpoint defaults.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/runpod_flash/core/resources/datacenter.py |
Defines the moved/pruned datacenter enum and CPU datacenter set. |
src/runpod_flash/core/resources/network_volume.py |
Imports DataCenter from the new module. |
src/runpod_flash/core/resources/serverless.py |
Imports datacenter symbols from the new module. |
src/runpod_flash/core/resources/__init__.py |
Re-exports datacenter symbols from the new module. |
src/runpod_flash/endpoint.py |
Adds default datacenter selection for non-CPU endpoint configs. |
tests/unit/test_endpoint.py |
Updates datacenter imports/values and endpoint default tests. |
tests/unit/test_p2_gaps.py |
Updates datacenter import path. |
tests/unit/runtime/test_resource_provisioner.py |
Updates expected supported datacenter values. |
tests/unit/resources/test_serverless.py |
Updates serverless datacenter tests to supported values. |
tests/unit/resources/test_network_volume.py |
Updates datacenter import path. |
tests/unit/cli/commands/build_utils/test_manifest.py |
Updates manifest fixture imports and datacenter values. |
Comments suppressed due to low confidence (1)
src/runpod_flash/endpoint.py:420
- Using a truthiness check here treats explicit falsy values (for example
datacenter=""ordatacenter=[]) the same as an omitted datacenter and silently replaces them with all datacenters. Those inputs would otherwise be rejected or preserved by downstream validation, so this can mask bad configuration instead of surfacing an error; check specifically forNonewhen applying the default.
if not self._is_cpu and not self.is_client and not self.datacenter:
self.datacenter = DataCenter.all()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if not self._is_cpu and not self.is_client and not self.datacenter: | ||
| self.datacenter = DataCenter.all() |
| class DataCenter(str, Enum): | ||
| """Enum representing available RunPod data centers. | ||
|
|
||
| NOTE: these are only datacenters with storage support, and s3 API support. |
| field_serializer, | ||
| model_validator, | ||
| ) | ||
| from .datacenter import DataCenter |
deanq
left a comment
There was a problem hiding this comment.
Coverage gap — suggest new test file
DataCenter.from_string and DataCenter.all() aren't directly tested. Suggest adding tests/unit/resources/test_datacenter.py:
from_stringhappy path (canonical, lowercase, underscore-separated)from_stringerror path (verify message includes valid list)all()returns expected 11 membersCPU_DATACENTERS == {DataCenter.EU_RO_1}(catches silent expansion)
Stale fixtures — tests/unit/resources/test_worker_availability_diagnostic.py
Still has literal "US-GA-2" string fixtures at lines 35, 51, 59, 76, 84, 100. They pass because they're strings, not enum refs, but they advertise a DC that no longer exists. Worth migrating to "US-CA-2" for consistency.
serverless.py:580 — silent fallback for unknown DC
With the enum shrunk, the except ValueError branch at src/runpod_flash/core/resources/serverless.py:582-583 now also triggers for legitimate platform DCs that the SDK simply doesn't know about yet — indistinguishable from a typo'd env var. Suggest:
except ValueError:
log.warning(
"RUNPOD_DEFAULT_DATACENTER=%r is not in the known DataCenter enum; "
"passing through as raw string. SDK may be stale.",
env_datacenter,
)
self.locations = env_datacenter| if not self._is_cpu and not self.is_client and not self.datacenter: | ||
| self.datacenter = DataCenter.all() |
There was a problem hiding this comment.
The trigger condition doesn't match the bug being fixed. The "no primary storage mount" error only occurs when a NetworkVolume is attached to a provisioning endpoint — but this currently fires for every GPU non-client endpoint regardless of volume, and (per Copilot's adjacent comment) skips image-mode endpoints that do provision. Two unintended consequences of the over-broad case:
- Endpoints that previously scheduled freely (no
locationsfilter) now get pinned to the 11 DCs in this enum. Any new DC the platform adds is silently excluded until SDK update. model_post_initwill synclocationson every endpoint that omitteddatacenter, changing the config hash and forcing redeploys on SDK upgrade.
Suggest gating on self.volume is not None (and addressing Copilot's image-mode point by checking id is None rather than is_client) so the default tracks the actual error condition.
|
|
||
| # if not in pure client mode, make sure default datacenters are set | ||
| # not CPU though, that gets pinned to specific datacenters | ||
| if not self._is_cpu and not self.is_client and not self.datacenter: |
There was a problem hiding this comment.
not self.datacenter treats None and [] identically, but they're semantically distinct: [] is a user explicitly opting out of DC restriction. Suggest if self.datacenter is None: so explicit empty list is preserved.
| class DataCenter(str, Enum): | ||
| """Enum representing available RunPod data centers. | ||
|
|
||
| NOTE: these are only datacenters with storage support, and s3 API support. | ||
| """ | ||
|
|
||
| # north america | ||
| US_CA_2 = "US-CA-2" | ||
| US_IL_1 = "US-IL-1" | ||
| US_KS_2 = "US-KS-2" | ||
| US_MO_1 = "US-MO-1" | ||
| US_MO_2 = "US-MO-2" | ||
| US_NC_2 = "US-NC-2" | ||
| US_NE_1 = "US-NE-1" | ||
| US_WA_1 = "US-WA-1" | ||
|
|
||
| # europe | ||
| EU_CZ_1 = "EU-CZ-1" | ||
| EU_RO_1 = "EU-RO-1" | ||
| EUR_NO_1 = "EUR-NO-1" |
There was a problem hiding this comment.
Removing US_GA_2, US_MD_1, US_NC_1, EUR_IS_1 from a public __all__ symbol is a breaking change — downstream DataCenter.US_GA_2 will AttributeError at import. Consider keeping them as deprecated enum members (filtered from all() / CPU_DATACENTERS) with DeprecationWarning on access, removed next major.
Either way, please grep flash-examples, flash-worker, and runpod-python for references before merge (Copilot already caught one re-export gap in network_volume.py).
| except ValueError: | ||
| valid = ", ".join(dc.value for dc in cls) | ||
| raise ValueError( | ||
| f"Unknown datacenter '{value}'. Valid datacenters: {valid}" | ||
| ) |
There was a problem hiding this comment.
Per project conventions, chain exceptions: raise ValueError(...) from e to preserve the original cause, or from None if intentional suppression. Currently neither, so the implicit __context__ leaks an unrelated frame.
| @classmethod | ||
| def all(cls) -> list["DataCenter"]: | ||
| """Return all datacenters.""" | ||
| return list(cls) |
There was a problem hiding this comment.
list(cls) allocates a fresh mutable list per call, and downstream code stores it on Endpoint instances where a caller could mutate it. Config-hash determinism also depends on enum iteration order — fine today, fragile. Consider:
_ALL: tuple["DataCenter", ...] = tuple(DataCenter) # at module bottom
@classmethod
def all(cls) -> tuple["DataCenter", ...]:
return cls._ALL|
|
||
| def test_datacenter_none_default(self): | ||
| ep = Endpoint(name="test") | ||
| assert ep.datacenter == DataCenter.all() |
There was a problem hiding this comment.
assert ep.datacenter == DataCenter.all() couples this test to whatever all() returns. If someone accidentally removes another DC, this still passes silently. Suggest asserting against a literal expected list so enum drift trips at least one test.
removes data centers from the
DataCenterenum that dont have both storage and s3 support. This should prevent the uncaught "no primary storage mount" error during worker initialization.Also moves the
DataCenterenum out ofnetwork_volume.pyintodatacenter.py.