promanomaly

Anomaly detection for Prometheus-compatible metrics, exposed as Prometheus metrics.

promanomaly reads a YAML config, queries a Prometheus-compatible TSDB (VictoriaMetrics by default), runs robust statistical detectors over a rolling window of recent data, and exposes anomaly scores on /metrics. Anomalies become first-class metrics — graphable in Grafana, alertable through Alertmanager, indistinguishable from anything else in your monitoring stack.

It is the sibling project to promforecast. Where promforecast does forecast-based anomaly detection on signals with learnable seasonality, promanomaly handles everything else: change-points, statistical baselines on signals that aren't reasonably forecastable, and (later) population/cohort divergence. Install one, both, or neither — they share architectural DNA but no runtime dependency.

Why

Catches what forecasting misses. Step changes after a deploy, spikes on sparse signals, and series diverging from their cohort — none of which forecast-based deviation alerts pick up reliably.
Drop-in. helm install next to an existing Prometheus and point it at a long-term TSDB. No production scrape changes required.
Speak Prometheus everywhere. PromQL in, exposition format out. The TSDB is swappable (VictoriaMetrics, Mimir, Thanos).
Anomalies are just metrics. Same labels, same retention, same dashboards. Compose with recording rules and alerts you already know.
Stateless. No database to operate. Baselines recompute on a schedule; results live in /metrics.
Cheap per series. Detectors are O(N log N) on a rolling window with no model fitting — roughly an order of magnitude lighter than a statistical forecast.

Quickstart

helm repo add promanomaly https://esops-dev.github.io/promanomaly
helm install promanomaly promanomaly/promanomaly-stack

This installs VictoriaMetrics, the detector, and example dashboards. For an existing TSDB, install only promanomaly/promanomaly.

For a complete day-one starter next to node-exporter, point the detector at examples/configs/node-exporter.yaml — a ready-made multi-group config watching the host golden signals (CPU, memory, filesystem fill + fill-rate, disk I/O, network errors, load) with detectors already matched to each signal's shape. Sibling starters cover kube-state-metrics, cAdvisor, kubelet, and application/middleware signals, and the docs/cookbook/ recipes explain which detector fits which signal. Once running, promanomaly top --target <url> ranks what is anomalous right now by a normalised severity signal — see docs/triage.md.

Editing the YAML config does not require a pod restart: the detector watches the mounted ConfigMap, validates the new config in-place, and rolls back on validation failure (kubectl apply to the ConfigMap is enough). POST /-/reload and SIGHUP trigger the same path. Run promanomaly validate --config path/to/config.yaml in CI to catch schema errors before merging; add --probe --strict to additionally execute every query against the live datasource and fail the build on empty or erroring queries. See docs/operations.md for the full CLI surface and operational guidance.

See docs/ for configuration, detector selection, and operational guidance — including docs/patterns.md, the composition cookbook of how anomaly scores plug into the rest of the Prometheus ecosystem (rate-of-change, group rollups, multi-window monitoring, deploy-time silencing, cross-tool joins with promforecast). docs/when-to-use.md walks through the "promforecast, promanomaly, or both?" decision tree, and docs/config-schema.md documents the v1 schema stability commitments. Before going to production, work through the docs/production-checklist.md sizing guide and readiness checklist, and keep the degraded-mode playbook handy for on-call. Running many clusters? The multi-cluster reference architectures spell out the three topologies and their cardinality / network / blast-radius trade-offs.

A self-monitoring example config is bundled at examples/configs/self-monitoring.yaml, and a starter PrometheusRule at examples/alerts/promanomaly-rules.yaml. The full multi-group production reference deployment — values, NetworkPolicy egress, ArgoCD Application, and Flux HelmRepository/HelmRelease — lives at examples/production/.

Feeding anomaly signal to the platform

Anomaly scores are most useful when the rest of your stack can act on them. promanomaly stays the signal provider; the systems that react (autoscalers, rollout controllers, dashboards, long-term storage) stay the decision-makers.

Push sinks ship each snapshot beyond /metrics: a Prometheus remote_write sink for long-horizon anomaly history, and a Grafana annotations sink that surfaces firings across every dashboard. See docs/sinks.md.
Kubernetes metrics adapter re-serves anomaly signal through the external/custom metrics APIs so HPA and KEDA can consume it. Opt-in via charts/promanomaly-metrics-adapter/; see docs/adapter.md.
Reaction recipes — KEDA, HPA, and Argo Rollouts cohort-gated canaries, with the autoscaling guard-rails baked in — live at examples/k8s/.
GitOps config validation — a composite GitHub Action validates your config on every PR (schema, cardinality/cost, optional live probe), and promanomaly generate-rules scaffolds a PrometheusRule from it. See docs/gitops.md.

Relationship with promforecast

	promforecast	promanomaly
Question	"What will this look like in N hours?"	"Is this abnormal right now?"
Method	Fit a model on a long lookback, predict ahead.	Compute a baseline on a short rolling window, score the present.
Latency	Minutes-to-hours (next refit).	Seconds-to-minutes (next scrape).
Best for	Capacity, traffic, KPIs.	Bursty, sparse, step-shifting, or fleet-relative signals.

Run both for full coverage: forecast-based deviation alerts on capacity metrics, plus change-point and baseline anomaly alerts on error rates, rare events, and per-instance outliers.

Status

The configuration schema is stable as of the first stable release: apiVersion: promanomaly.io/v1 and the values.schema.json $id both move from alpha to v1. The previous promanomaly.io/v1alpha1 alias keeps loading as a byte-identical form with a deprecation warning. See docs/config-schema.md for the stability commitments, including which changes are breaking and which are not.

Output metric names and label sets are part of the public API and follow Prometheus conventions strictly (no colons — those are reserved for recording rules).

Security

Found a vulnerability? See SECURITY.md. Please do not open public issues for security reports.

Contributing

See CONTRIBUTING.md and the docs/ folder for detector authoring.

License

Apache-2.0. See LICENSE and NOTICE.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github		.github
charts		charts
community		community
dashboards/grafana		dashboards/grafana
detector		detector
docker		docker
docs		docs
examples		examples
.chart-testing-lintconf.yaml		.chart-testing-lintconf.yaml
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LABELS_CONTRACT.md		LABELS_CONTRACT.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
action.yml		action.yml
docker-compose.dev.yml		docker-compose.dev.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

promanomaly

Why

Quickstart

Feeding anomaly signal to the platform

Relationship with promforecast

Status

Security

Contributing

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

promanomaly

Why

Quickstart

Feeding anomaly signal to the platform

Relationship with promforecast

Status

Security

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages