Skip to content

HA: change for NethSecurtiy 8.8.#1706

Merged
gsanchietti merged 3 commits into
nethsecurity-8.8from
ns-ha-locking
Jun 4, 2026
Merged

HA: change for NethSecurtiy 8.8.#1706
gsanchietti merged 3 commits into
nethsecurity-8.8from
ns-ha-locking

Conversation

@gsanchietti
Copy link
Copy Markdown
Member

@gsanchietti gsanchietti commented Jun 3, 2026

Main changes:

Fixes: #1547 #1656

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the HA (keepalived-based) stack for NethSecurity 8.8 by reintroducing an optional hotplug locking mechanism for state transitions and migrating HA alerting from direct portal calls to the standard Telegraf → Victoria Metrics/vmalert → ns-plug-alert-proxy pipeline.

Changes:

  • Added keepalived hotplug serialization lock scripts (00-lock / 999-unlock) controlled by keepalived.globals.ns_lock_timeout.
  • Implemented HA alert metrics collection via Telegraf (telegraf-ha-alert + ha.conf) and new vmalert rules (ha.yaml).
  • Updated ns-plug-alert-proxy, ns-ha docs, and ns-api HA status output to reflect the new alerting/locking model.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
packages/victoria-metrics/files/vmalert-rules/ha.yaml Adds vmalert rules for HA failover/sync alerts based on Telegraf-exported metrics.
packages/telegraf/Makefile Installs the new HA Telegraf config and collector script.
packages/telegraf/files/telegraf.conf.d/ha.conf Configures Telegraf to execute the HA collector and parse JSON metrics.
packages/telegraf/files/telegraf-ha-alert Emits HA event/state metrics (primary failed/recovered, sync failed/recovered).
packages/ns-plug/README.md Documents new HA alerts and the vmalert→proxy forwarding path.
packages/ns-plug/files/ns-plug-alert-proxy Adds HA alert ID mappings (but currently has incorrect status semantics for HA event alerts).
packages/ns-ha/README.md Documents hotplug lock option and the new alerting approach/semantics.
packages/ns-ha/Makefile Installs new keepalived hotplug scripts for locking and alert-state recording.
packages/ns-ha/files/ns.sh Removes direct legacy portal alert-sending functions.
packages/ns-ha/files/ns-rsync.sh Records sync failure/recovery timestamps into UCI state for metric export.
packages/ns-ha/files/ns-ha-config Displays lock status/timeout in CLI status output.
packages/ns-ha/files/00-lock Implements optional keepalived hotplug serialization lock (needs fixes for exit/validation/flag ownership).
packages/ns-ha/files/999-unlock Releases the hotplug lock (needs to align flag ownership with 00-lock).
packages/ns-ha/files/900-ns-plug Removes direct cluster alert calls; now only manages cron + hotplug flow.
packages/ns-ha/files/05-alert-state Records primary failover/recovery event timestamps into UCI state.
packages/ns-api/README.md Documents new lock_status / lock_timeout fields in HA status output.
packages/ns-api/files/ns.ha Adds lock_status / lock_timeout to HA status output (needs to treat 0/invalid as disabled).
AGENTS.md Notes the standard monitoring pipeline for alerts including HA.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/ns-plug/files/ns-plug-alert-proxy Outdated
Comment thread packages/ns-plug/files/ns-plug-alert-proxy Outdated
Comment thread packages/ns-api/files/ns.ha
Comment thread packages/ns-ha/README.md
Comment thread packages/ns-ha/files/999-unlock
Comment thread packages/ns-ha/files/00-lock
@gsanchietti gsanchietti force-pushed the ns-ha-locking branch 3 times, most recently from 685a407 to f924716 Compare June 3, 2026 16:30
@gsanchietti gsanchietti requested a review from Tbaile June 3, 2026 16:55
@Tbaile Tbaile force-pushed the nethsecurity-8.8 branch from f19bdfc to 1ce9f52 Compare June 4, 2026 07:07
Tbaile and others added 3 commits June 4, 2026 10:54
To avoid regressions, the locking system is disabled by default.
Enable it by settings keepalived.globals.ns_lock_timeout option.
Keep the HA event source in ns-ha, but install the Telegraf
collector and vmalert rules.

Assisted-by: Copilot:gpt-5.4-mini
Copy link
Copy Markdown
Collaborator

@Tbaile Tbaile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existence of the uci_revert_state and uci_set_state is terrifying.

We could down the like allow telegraf to scrape the service using snmp, until then, fine by me

Comment thread packages/telegraf/files/telegraf.conf.d/ha.conf
@gsanchietti gsanchietti merged commit 8ed4f18 into nethsecurity-8.8 Jun 4, 2026
2 checks passed
@gsanchietti gsanchietti deleted the ns-ha-locking branch June 4, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants