The Azure VM detector currently treats any HTTP 4xx response from IMDS as "not running in Azure" and returns resource.Empty(), nil.
I think this behavior is too broad, and it may hide real detector failures.
Code reference:
detectors/azure/azurevm/vm.go
Current logic:
if resp.StatusCode == http.StatusOK {
bytes, err := io.ReadAll(resp.Body)
return bytes, true, err
}
runningInAzure := resp.StatusCode < 400 || resp.StatusCode > 499
return nil, runningInAzure, errors.New(http.StatusText(resp.StatusCode))
That means:
- 404 => treated as not running in Azure
- 400 => treated as not running in Azure
- 405 => treated as not running in Azure
- 429 / 5xx => treated as running in Azure, but with an error
Why this seems problematic
According to Azure IMDS docs, different 4xx codes have different meanings:
- 400: malformed request
- 404: requested element does not exist
- 405: method not allowed
- 429: rate limited
- 5xx: transient/internal failure
Reference:
https://learn.microsoft.com/en-us/azure/virtual-machines/instance-metadata-service
For this detector, the request is fixed:
- GET
- Metadata: true
- fixed compute endpoint
- fixed API version
Because of that, statuses like 400 or 405 seem more like detector/request problems than "not on Azure". Returning resource.Empty(), nil for those cases may silently hide
real failures.
Proposed direction
Would maintainers be open to narrowing the no-op behavior to 404 only, and returning an error for other non-200 statuses?
Proposed behavior:
- 200 => success
- 404 => resource.Empty(), nil
- 400, 405, 429, 5xx, other non-200 => error
I am intentionally not proposing broader behavior changes around network failures in this issue, since that seems like a separate policy question.
Why I’m opening an issue first
I saw that the detector was introduced in #5422 and this area already had some discussion there, so I wanted to confirm expected behavior before proposing a PR.
If this direction makes sense, I can open a focused PR with:
- explicit status classification in vm.go
- table-driven tests covering 400, 404, 405, 429, and 500
The Azure VM detector currently treats any HTTP
4xxresponse from IMDS as "not running in Azure" and returnsresource.Empty(), nil.I think this behavior is too broad, and it may hide real detector failures.
Code reference:
detectors/azure/azurevm/vm.goCurrent logic:
That means:
Why this seems problematic
According to Azure IMDS docs, different 4xx codes have different meanings:
Reference:
https://learn.microsoft.com/en-us/azure/virtual-machines/instance-metadata-service
For this detector, the request is fixed:
Because of that, statuses like 400 or 405 seem more like detector/request problems than "not on Azure". Returning resource.Empty(), nil for those cases may silently hide
real failures.
Proposed direction
Would maintainers be open to narrowing the no-op behavior to 404 only, and returning an error for other non-200 statuses?
Proposed behavior:
I am intentionally not proposing broader behavior changes around network failures in this issue, since that seems like a separate policy question.
Why I’m opening an issue first
I saw that the detector was introduced in #5422 and this area already had some discussion there, so I wanted to confirm expected behavior before proposing a PR.
If this direction makes sense, I can open a focused PR with: