Skip to content

detectors/azure/azurevm: clarify and tighten IMDS non-200 status handling #8790

@Aminkbi

Description

@Aminkbi

The Azure VM detector currently treats any HTTP 4xx response from IMDS as "not running in Azure" and returns resource.Empty(), nil.

I think this behavior is too broad, and it may hide real detector failures.

Code reference:
detectors/azure/azurevm/vm.go

Current logic:

if resp.StatusCode == http.StatusOK {
    bytes, err := io.ReadAll(resp.Body)
    return bytes, true, err
}

runningInAzure := resp.StatusCode < 400 || resp.StatusCode > 499
return nil, runningInAzure, errors.New(http.StatusText(resp.StatusCode))

That means:

  • 404 => treated as not running in Azure
  • 400 => treated as not running in Azure
  • 405 => treated as not running in Azure
  • 429 / 5xx => treated as running in Azure, but with an error

Why this seems problematic

According to Azure IMDS docs, different 4xx codes have different meanings:

  • 400: malformed request
  • 404: requested element does not exist
  • 405: method not allowed
  • 429: rate limited
  • 5xx: transient/internal failure

Reference:
https://learn.microsoft.com/en-us/azure/virtual-machines/instance-metadata-service

For this detector, the request is fixed:

  • GET
  • Metadata: true
  • fixed compute endpoint
  • fixed API version

Because of that, statuses like 400 or 405 seem more like detector/request problems than "not on Azure". Returning resource.Empty(), nil for those cases may silently hide
real failures.

Proposed direction

Would maintainers be open to narrowing the no-op behavior to 404 only, and returning an error for other non-200 statuses?

Proposed behavior:

  • 200 => success
  • 404 => resource.Empty(), nil
  • 400, 405, 429, 5xx, other non-200 => error

I am intentionally not proposing broader behavior changes around network failures in this issue, since that seems like a separate policy question.

Why I’m opening an issue first

I saw that the detector was introduced in #5422 and this area already had some discussion there, so I wanted to confirm expected behavior before proposing a PR.

If this direction makes sense, I can open a focused PR with:

  • explicit status classification in vm.go
  • table-driven tests covering 400, 404, 405, 429, and 500

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions