Skip to content

feat: Handle HTTP responses with HTML in body, add retry handling for python sdk on transient errors, update docs for .NET about retry handling#126

Merged
joakimia merged 13 commits into
mainfrom
feat/add-retry
Jun 30, 2026
Merged

feat: Handle HTTP responses with HTML in body, add retry handling for python sdk on transient errors, update docs for .NET about retry handling#126
joakimia merged 13 commits into
mainfrom
feat/add-retry

Conversation

@joakimia

@joakimia joakimia commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Add retry logic, graceful error handling, and cancellation token support

⚠️ Minor breaking change for .NET — see Breaking changes section below.

Summary

Fixes a crash where the SDK threw an unhandled JsonException when Application Gateway returned a 504 Gateway Timeout (or 500) with an HTML response body instead of JSON.

Adds automatic retry with exponential backoff for transient gateway errors (via AddStandardResilienceHandler in the Extensions package), full CancellationToken propagation throughout the .NET SDK, and timeout support in Python.


Bug fixed

Unexpected character encountered while parsing value: <. Path '', line 0, position 0

When Application Gateway returns 500 or 504 with an HTML error page, the SDK previously attempted to deserialize it as ProblemDetails JSON — causing an unhandled JsonException to propagate to the caller. The SDK now catches JsonException gracefully and wraps it in a clean HeimdallApiException / HeimdallApiError.


Changes

.NET

File Change
HeimdallApiHttpClient.cs Catches JsonException on HTML error bodies; includes CancellationToken in HandleResponse, ExecuteWithAuthRetry, and RefreshAccessToken; passes cancellationToken to AcquireTokenAsync
AccessTokenProvider.cs IAccessTokenProvider.AcquireTokenAsync and implementation now accept CancellationToken; token is forwarded to MSAL's ExecuteAsync
IHeimdallApiClient.cs All methods have optional CancellationToken cancellationToken = default; includes newly merged GetSagAndClearancesAsync and GetIcingsAsync
HeimdallApiClient.cs All methods forward CancellationToken; includes newly merged GetSagAndClearancesAsync and GetIcingsAsync
HeimdallApiClientExtensions.cs Wires AddStandardResilienceHandler on the named HttpClient — gives retry, circuit breaker, and total-timeout pipeline automatically when using DI
WhenHandlingErrorResponses/ Unit tests: HTML body handling, 404 with JSON/empty body, successful deserialization (38 unit tests total)
WhenUsingResilienceExtensions/ Unit tests: verifies retry on transient codes, no-retry on permanent codes, recovery on second attempt, smoke-test via AddHeimdallPowerApiClient
Integration tests New: GetCircuitRatings, GetConductorTemperatures, GetCurrents, GetHeimdallAars, GetHeimdallDlrs
README.md New Resilience and retry, Exceptions, and Cancellation and timeouts sections
Program.cs (example) Error handling example, cancellation token usage

Python

File Change
errors.py New HeimdallApiError(Exception) with status_code and is_transient()
client.py _execute_with_retry with exponential backoff; new timeout constructor parameter
capacity_monitoring.py, grid_insights.py, assets.py Raise HeimdallApiError instead of bare Exception
__init__.py Exports HeimdallApiError
tests/unit/test_retry_behavior.py 44 unit tests
README.md New Error Handling and Retry and Timeouts sections
print_heimdall_dlr.py (example) Uses HeimdallApiError, commented timeout example

Retry behaviour

.NET — Extensions package only

The core package (HeimdallPower.Api.Client) does not retry automatically. Transient errors are thrown immediately as HeimdallApiException.

The Extensions package (HeimdallPower.Api.Client.Extensions) adds a full resilience pipeline via AddStandardResilienceHandler when registering with AddHeimdallPowerApiClient:

Layer Behaviour
Retry Up to 3 retries with exponential back-off + jitter on 5xx, 408, 429, and HttpRequestException
Circuit breaker Opens after sustained failures to avoid hammering an unavailable service
Total request timeout Caps the total time including all retries
Status code Retried?
502 Bad Gateway ✅ Yes
503 Service Unavailable ✅ Yes
504 Gateway Timeout ✅ Yes
500 Internal Server Error ✅ Yes
400, 401, 403, 404 ❌ No — client errors
HttpRequestException / network error ✅ Yes

Python — built into core client

  • Max retries: 3 (4 total attempts)
  • Backoff: 1 s → 2 s → 4 s (exponential)
  • HTML bodies are handled gracefully — no JsonException
  • After all retries are exhausted, the last HeimdallApiError is re-raised with the original status code

Breaking changes

.NET — BREAKING CHANGE (minor)

Two sets of changes affect implementors:

  1. IHeimdallApiClient — all method signatures now include CancellationToken cancellationToken = default as the last parameter, including the newly added GetSagAndClearancesAsync and GetIcingsAsync.
  2. IAccessTokenProvider.AcquireTokenAsync — now accepts CancellationToken (internal interface, only relevant if you have custom token provider implementations).
  • ✅ NOT breaking for callers — default parameter, existing call sites compile unchanged
  • ✅ NOT breaking for Moq/NSubstitute mocks — generated at runtime
  • ❌ BREAKING for any class that manually implements IHeimdallApiClient — compile error; add CancellationToken cancellationToken = default to each method

Recommended: bump minor version (semver).

Python — No breaking changes

  • HeimdallApiError is a new class (additive)
  • timeout constructor parameter defaults to None (existing behaviour unchanged)
  • Callers catching bare Exception continue to work; callers can now catch HeimdallApiError specifically for richer error info

Comment thread dotnet/HeimdallPower.Api.Client/HeimdallPower.Api.Client/HeimdallApiHttpClient.cs Outdated
Comment thread python/heimdall_api_client/capacity_monitoring.py Outdated
@joakimia joakimia marked this pull request as ready for review June 25, 2026 13:05
@joakimia joakimia requested a review from a team as a code owner June 25, 2026 13:05
@mHjertaker

Copy link
Copy Markdown
Contributor

Nice work, the CancellationToken threading and the HTML-body test coverage are great. One architectural concern before merging.

The retry layer collides with AddStandardResilienceHandler. Our DI registration already calls .AddStandardResilienceHandler(), which retries 502/503/504 + HttpRequestException with backoff, jitter, and a circuit breaker. The new ExecuteWithRetryAsync adds a second retry loop on top, so Extensions users get nested retries (~16 requests for one call) with compounding delays.

Proposal:

  • Keep the HTML-body / JsonException fix, needed everywhere.
  • Remove ExecuteWithRetryAsync from the core client. DI users already get retries (with jitter, circuit breaker, and a configurable policy) from AddStandardResilienceHandler.
  • Keep the CancellationToken changes
  • Document that the non-DI path (new HeimdallApiClient(...)) does not auto-retries; callers wanting resilience should use the Extensions package.

@joakimia

Copy link
Copy Markdown
Contributor Author

Nice work, the CancellationToken threading and the HTML-body test coverage are great. One architectural concern before merging.

The retry layer collides with AddStandardResilienceHandler. Our DI registration already calls .AddStandardResilienceHandler(), which retries 502/503/504 + HttpRequestException with backoff, jitter, and a circuit breaker. The new ExecuteWithRetryAsync adds a second retry loop on top, so Extensions users get nested retries (~16 requests for one call) with compounding delays.

Proposal:

  • Keep the HTML-body / JsonException fix, needed everywhere.
  • Remove ExecuteWithRetryAsync from the core client. DI users already get retries (with jitter, circuit breaker, and a configurable policy) from AddStandardResilienceHandler.
  • Keep the CancellationToken changes
  • Document that the non-DI path (new HeimdallApiClient(...)) does not auto-retries; callers wanting resilience should use the Extensions package.

Ok, and for the python package: i dont think we have any extension packages there that we can use?
Should we perhaps remove the retry logic there as well, and document that the developers are responsible for handling retries themselves?

@joakimia joakimia changed the title feat: Add retry handling on transient errors, retry up to 3 times feat: Handle HTTP responses with HTML in body, add retry handling for python sdk on transient errors, update docs for .NET about retry handling Jun 29, 2026
@joakimia

Copy link
Copy Markdown
Contributor Author

Discussed outside the thread.
Agreed to keep the simple python retry logic and merge that in.
Also removed the retry logic in .NET.
Updated READMEs, Docstrings, examples and tests on how to handle transient errors with the .Extensions package.

The SCADA connector already applies resiliency by calling AddHeimdallPowerApiClient - so it will retry after this merge when handling the HTML response body and the transient HTTP Status Codes instead of throwing an exception.

The python sdk will also do some simple retry handling.

Added some tests, updated docstrings and examples.

Comment thread dotnet/examples/Api.Client.Examples/Program.cs Outdated
Comment thread dotnet/examples/Api.Client.Examples/Program.cs Outdated
Comment thread dotnet/README.md
@joakimia joakimia merged commit 5fbf0bc into main Jun 30, 2026
4 checks passed
@joakimia joakimia deleted the feat/add-retry branch June 30, 2026 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants