Skip to content

fix: align datacenter enum with valid real-world targets#336

Open
KAJdev wants to merge 2 commits into
mainfrom
zeke/ae-3084-prevent-no-primary-mount-point-error-when-workers-are
Open

fix: align datacenter enum with valid real-world targets#336
KAJdev wants to merge 2 commits into
mainfrom
zeke/ae-3084-prevent-no-primary-mount-point-error-when-workers-are

Conversation

@KAJdev
Copy link
Copy Markdown
Contributor

@KAJdev KAJdev commented May 12, 2026

removes data centers from the DataCenter enum that dont have both storage and s3 support. This should prevent the uncaught "no primary storage mount" error during worker initialization.

Also moves the DataCenter enum out of network_volume.py into datacenter.py.

@KAJdev KAJdev requested a review from deanq May 12, 2026 18:17
@promptless
Copy link
Copy Markdown

promptless Bot commented May 12, 2026

Promptless prepared a documentation update related to this change.

Triggered by runpod/flash PR #336

Updated datacenter reference tables and code examples to reflect the removal of four datacenters that lack storage + S3 support (US-GA-2, US-MD-1, US-NC-1, EUR-IS-1).

Review: Update Flash datacenter documentation

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR narrows the DataCenter enum to supported storage/S3 locations, moves it into a dedicated resource module, and updates related imports/tests.

Changes:

  • Adds core/resources/datacenter.py for DataCenter and CPU_DATACENTERS.
  • Updates resource imports to use the new datacenter module.
  • Refreshes tests to use supported datacenter values and validate new endpoint defaults.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/runpod_flash/core/resources/datacenter.py Defines the moved/pruned datacenter enum and CPU datacenter set.
src/runpod_flash/core/resources/network_volume.py Imports DataCenter from the new module.
src/runpod_flash/core/resources/serverless.py Imports datacenter symbols from the new module.
src/runpod_flash/core/resources/__init__.py Re-exports datacenter symbols from the new module.
src/runpod_flash/endpoint.py Adds default datacenter selection for non-CPU endpoint configs.
tests/unit/test_endpoint.py Updates datacenter imports/values and endpoint default tests.
tests/unit/test_p2_gaps.py Updates datacenter import path.
tests/unit/runtime/test_resource_provisioner.py Updates expected supported datacenter values.
tests/unit/resources/test_serverless.py Updates serverless datacenter tests to supported values.
tests/unit/resources/test_network_volume.py Updates datacenter import path.
tests/unit/cli/commands/build_utils/test_manifest.py Updates manifest fixture imports and datacenter values.
Comments suppressed due to low confidence (1)

src/runpod_flash/endpoint.py:420

  • Using a truthiness check here treats explicit falsy values (for example datacenter="" or datacenter=[]) the same as an omitted datacenter and silently replaces them with all datacenters. Those inputs would otherwise be rejected or preserved by downstream validation, so this can mask bad configuration instead of surfacing an error; check specifically for None when applying the default.
        if not self._is_cpu and not self.is_client and not self.datacenter:
            self.datacenter = DataCenter.all()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +419 to +420
if not self._is_cpu and not self.is_client and not self.datacenter:
self.datacenter = DataCenter.all()
class DataCenter(str, Enum):
"""Enum representing available RunPod data centers.

NOTE: these are only datacenters with storage support, and s3 API support.
field_serializer,
model_validator,
)
from .datacenter import DataCenter
Copy link
Copy Markdown
Member

@deanq deanq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage gap — suggest new test file

DataCenter.from_string and DataCenter.all() aren't directly tested. Suggest adding tests/unit/resources/test_datacenter.py:

  • from_string happy path (canonical, lowercase, underscore-separated)
  • from_string error path (verify message includes valid list)
  • all() returns expected 11 members
  • CPU_DATACENTERS == {DataCenter.EU_RO_1} (catches silent expansion)

Stale fixtures — tests/unit/resources/test_worker_availability_diagnostic.py

Still has literal "US-GA-2" string fixtures at lines 35, 51, 59, 76, 84, 100. They pass because they're strings, not enum refs, but they advertise a DC that no longer exists. Worth migrating to "US-CA-2" for consistency.

serverless.py:580 — silent fallback for unknown DC

With the enum shrunk, the except ValueError branch at src/runpod_flash/core/resources/serverless.py:582-583 now also triggers for legitimate platform DCs that the SDK simply doesn't know about yet — indistinguishable from a typo'd env var. Suggest:

except ValueError:
    log.warning(
        "RUNPOD_DEFAULT_DATACENTER=%r is not in the known DataCenter enum; "
        "passing through as raw string. SDK may be stale.",
        env_datacenter,
    )
    self.locations = env_datacenter

Comment on lines +419 to +420
if not self._is_cpu and not self.is_client and not self.datacenter:
self.datacenter = DataCenter.all()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trigger condition doesn't match the bug being fixed. The "no primary storage mount" error only occurs when a NetworkVolume is attached to a provisioning endpoint — but this currently fires for every GPU non-client endpoint regardless of volume, and (per Copilot's adjacent comment) skips image-mode endpoints that do provision. Two unintended consequences of the over-broad case:

  1. Endpoints that previously scheduled freely (no locations filter) now get pinned to the 11 DCs in this enum. Any new DC the platform adds is silently excluded until SDK update.
  2. model_post_init will sync locations on every endpoint that omitted datacenter, changing the config hash and forcing redeploys on SDK upgrade.

Suggest gating on self.volume is not None (and addressing Copilot's image-mode point by checking id is None rather than is_client) so the default tracks the actual error condition.


# if not in pure client mode, make sure default datacenters are set
# not CPU though, that gets pinned to specific datacenters
if not self._is_cpu and not self.is_client and not self.datacenter:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not self.datacenter treats None and [] identically, but they're semantically distinct: [] is a user explicitly opting out of DC restriction. Suggest if self.datacenter is None: so explicit empty list is preserved.

Comment on lines +4 to +23
class DataCenter(str, Enum):
"""Enum representing available RunPod data centers.

NOTE: these are only datacenters with storage support, and s3 API support.
"""

# north america
US_CA_2 = "US-CA-2"
US_IL_1 = "US-IL-1"
US_KS_2 = "US-KS-2"
US_MO_1 = "US-MO-1"
US_MO_2 = "US-MO-2"
US_NC_2 = "US-NC-2"
US_NE_1 = "US-NE-1"
US_WA_1 = "US-WA-1"

# europe
EU_CZ_1 = "EU-CZ-1"
EU_RO_1 = "EU-RO-1"
EUR_NO_1 = "EUR-NO-1"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing US_GA_2, US_MD_1, US_NC_1, EUR_IS_1 from a public __all__ symbol is a breaking change — downstream DataCenter.US_GA_2 will AttributeError at import. Consider keeping them as deprecated enum members (filtered from all() / CPU_DATACENTERS) with DeprecationWarning on access, removed next major.

Either way, please grep flash-examples, flash-worker, and runpod-python for references before merge (Copilot already caught one re-export gap in network_volume.py).

Comment on lines +35 to +39
except ValueError:
valid = ", ".join(dc.value for dc in cls)
raise ValueError(
f"Unknown datacenter '{value}'. Valid datacenters: {valid}"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per project conventions, chain exceptions: raise ValueError(...) from e to preserve the original cause, or from None if intentional suppression. Currently neither, so the implicit __context__ leaks an unrelated frame.

Comment on lines +41 to +44
@classmethod
def all(cls) -> list["DataCenter"]:
"""Return all datacenters."""
return list(cls)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list(cls) allocates a fresh mutable list per call, and downstream code stores it on Endpoint instances where a caller could mutate it. Config-hash determinism also depends on enum iteration order — fine today, fragile. Consider:

_ALL: tuple["DataCenter", ...] = tuple(DataCenter)  # at module bottom

@classmethod
def all(cls) -> tuple["DataCenter", ...]:
    return cls._ALL


def test_datacenter_none_default(self):
ep = Endpoint(name="test")
assert ep.datacenter == DataCenter.all()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert ep.datacenter == DataCenter.all() couples this test to whatever all() returns. If someone accidentally removes another DC, this still passes silently. Suggest asserting against a literal expected list so enum drift trips at least one test.

@deanq deanq requested review from jhcipar and runpod-Henrik May 21, 2026 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants