Skip to content

Telegram alerting (notifications-only): node/worker down + recovered (#121)#143

Open
VijitSingh97 wants to merge 2 commits into
mainfrom
claude/sharp-mcnulty-28d48a
Open

Telegram alerting (notifications-only): node/worker down + recovered (#121)#143
VijitSingh97 wants to merge 2 commits into
mainfrom
claude/sharp-mcnulty-28d48a

Conversation

@VijitSingh97
Copy link
Copy Markdown
Collaborator

Closes #121.

Ships a notifications-only Telegram alerter for v1.0 — a thin notifier that pushes a small, high-value set of operational alerts to one chat. The interactive bot / command interface stays in #45.

What you get (all debounced — one message per real transition)

Alert Source signal
🔴/🟢 Node down / recovered NodeHealthMonitor's debounced down edge per node (#31) — consumed, not re-collected
🔴/🟢 Worker offline / back online new flap-protected per-worker presence tracker
Sync finished the sync-gate miner_released latch flipping open (#35)

Off by default: with no telegram config nothing sends and nothing errors.

Design

Three small single-responsibility modules under build/dashboard/mining_dashboard/service/:

  • telegram_notifier.py — thin sendMessage transport. Enabled only when switched on and both bot_token + chat_id are present; fails silently on offline / Tor-only hosts; never logs the bot token (not even inside an exception message, which would otherwise embed the URL).
  • worker_presence.pyWorkerPresenceMonitor, the one genuinely new building block: the per-worker analogue of NodeHealthMonitor (debounce + recovery hysteresis + silent first-sight baseline).
  • alert_service.py — folds the loop's existing signals into debounced alerts. evaluate() is pure (fully unit-tested); process() dispatches off-thread so a slow Telegram call can't stall the data loop. Wired into data_service.run().

Robustness: edge baselines seed silently so a dashboard restart never replays a stale alert; worker tracking resets when the proxy is intentionally stopped (sync hold / node-down failover) so expected rig absence doesn't spam "offline"; Tari node alerts fire only when Tari is required; messages are hostname-prefixed so multiple stacks can share one chat.

Plumbing

config.json telegram.*pithead render_env → per-event env vars → config.py. The bot_token is rendered to the owner-only .env and masked in the apply preview. Injected into the dashboard container in docker-compose.yml; added to config.advanced.example.json.

Documentation

New docs/telegram.md: a first-time-user walkthrough — create a bot via BotFather, find your chat id (with a manual fallback), per-event toggles, the "one chat, two bots" pattern for sharing a chat with the Healthchecks.io monitor (#79), a Tor-only caveat, and a troubleshooting table. Cross-referenced from the docs index, the configuration reference, and the CHANGELOG.

Acceptance criteria

  • telegram.enabled=false by default; when off, nothing sends and no errors
  • Each enabled event fires once per transition (debounced — no flapping spam)
  • Node down/recovered consume NodeHealthMonitor edges (no re-collection)
  • Worker offline/back-online via a debounced per-worker tracker
  • Sync-finished fires once when the sync gate opens
  • bot_token never logged; lives in .env, owner-only
  • Fails silently / no noise when offline or Tor-only (Dashboard: new-version warning + one-click upgrade button #59 discipline)
  • Documented setup (docs/telegram.md)

Testing

  • 439 pytest pass (49 new across 3 files), coverage 93.19% (gate 80%; new modules 100% / 100% / 94%)
  • 103 stack tests pass — new cases for env propagation + bot-token secrecy (not leaked in .env preview)
  • 27/27 frontend, shellcheck clean, docker compose config valid + hardening directives intact

Note for reviewers

known_workers / last_seen / update_known_workers in storage_service.py turned out to be dead code — table + migration + getters/setters exist but nothing in the loop populates it. The issue suggested basing worker-offline detection on last_seen; since that field is never written, the monitor is driven off the live worker list instead. Left the dead code in place (may be intended for a future feature) — flagged here as a possible follow-up cleanup.

🤖 Generated with Claude Code

VijitSingh97 and others added 2 commits June 4, 2026 02:17
Ship a thin, notifications-only Telegram pusher for v1.0: node down/recovered,
worker offline/back online, and sync finished. Off by default; no interactive
bot (that stays in #45).

Consumes signals the data loop already computes rather than re-collecting:
- node down/recovered from NodeHealthMonitor's debounced `down` edge (#31)
- sync finished from the sync-gate `miner_released` latch (#35)
- worker offline/back via a new flap-protected per-worker presence tracker

New dashboard modules:
- telegram_notifier.py: thin sendMessage transport; enabled only with token +
  chat_id; fail-silent on offline/Tor-only hosts; never logs the bot token.
- worker_presence.py: WorkerPresenceMonitor, the per-worker analogue of
  NodeHealthMonitor (debounce + recovery hysteresis + silent baseline + reset
  when the proxy is intentionally stopped).
- alert_service.py: folds per-cycle signals into debounced alerts; pure
  evaluate() + off-thread process(); wired into data_service.run().

Plumbing: config.json telegram.* -> pithead render_env -> per-event env vars ->
config.py. bot_token rendered to the owner-only .env and masked in the apply
preview. Injected into the dashboard container in docker-compose; added to the
advanced example config.

Docs: new docs/telegram.md setup guide (BotFather, chat id, per-event toggles,
"one chat, two bots" with #79, Tor-only caveat, troubleshooting); cross-refs in
the docs index, configuration reference, and CHANGELOG.

Tests: 49 new pytest cases (notifier/monitor/alert service), plus stack tests
for env propagation and bot-token secrecy. Full suite green; coverage 93%.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telegram alerting (notifications-only): node/worker down + recovered (config.json, default off)

1 participant