feat: add healthcheck to execution service and wait condition for consensus#1005
Open
feat: add healthcheck to execution service and wait condition for consensus#1005
Conversation
…sensus Previously the consensus node would start immediately after the execution client container was created, without waiting for its JSON-RPC to become available. On first boot or after a restart, this caused the consensus service to crash-loop while the execution client was still initialising. Changes: - Add healthcheck to execution service that polls eth_syncing via JSON-RPC. The check passes as soon as the RPC endpoint responds, confirming the client is fully booted (node does not need to be fully synced). - Change depends_on on the node service to condition: service_healthy so the consensus client only starts once the execution client is ready. Healthcheck parameters: interval: 30s - re-poll every 30 seconds timeout: 10s - single-request timeout retries: 5 - mark unhealthy after 5 consecutive failures start_period: 60s - grace window for slow database init on first boot Backwards-compatible: no changes to .env files or entrypoints required.
Collaborator
🟡 Heimdall Review Status
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When running
docker compose up, thenodeservice (consensus client) starts immediately after theexecutioncontainer is created — not after it's actually ready to serve requests. On first boot or after a restart with a large database, the execution client can take 30–120 seconds before its JSON-RPC becomes available. During this window the consensus service repeatedly fails to connect and enters a crash-loop.This is a frequently reported issue in the
#🛠|node-operatorsDiscord channel.Solution
healthcheckto theexecutionservice that pollseth_syncingvia JSON-RPC. The check passes as soon as the RPC endpoint responds (the node does not need to be fully synced — just started).depends_onon thenodeservice tocondition: service_healthyso the consensus client only starts once the execution client's RPC is live.Healthcheck parameters
intervaltimeoutretriesstart_periodTesting
Verified with
CLIENT=rethandCLIENT=geth. Thenodeservice now waits correctly on fresh starts and afterdocker compose restart execution.Backwards compatibility
No changes to
.envfiles or entrypoints. Existing deployments require no migration.