Anomaly detection for Prometheus-compatible metrics, exposed as Prometheus metrics.
promanomaly reads a YAML config, queries a Prometheus-compatible TSDB
(VictoriaMetrics by default), runs robust statistical detectors over a
rolling window of recent data, and exposes anomaly scores on /metrics.
Anomalies become first-class metrics — graphable in Grafana, alertable
through Alertmanager, indistinguishable from anything else in your
monitoring stack.
It is the sibling project to promforecast. Where promforecast does forecast-based anomaly detection on signals with learnable seasonality, promanomaly handles everything else: change-points, statistical baselines on signals that aren't reasonably forecastable, and (later) population/cohort divergence. Install one, both, or neither — they share architectural DNA but no runtime dependency.
- Catches what forecasting misses. Step changes after a deploy, spikes on sparse signals, and series diverging from their cohort — none of which forecast-based deviation alerts pick up reliably.
- Drop-in.
helm installnext to an existing Prometheus and point it at a long-term TSDB. No production scrape changes required. - Speak Prometheus everywhere. PromQL in, exposition format out. The TSDB is swappable (VictoriaMetrics, Mimir, Thanos).
- Anomalies are just metrics. Same labels, same retention, same dashboards. Compose with recording rules and alerts you already know.
- Stateless. No database to operate. Baselines recompute on a
schedule; results live in
/metrics. - Cheap per series. Detectors are O(N log N) on a rolling window with no model fitting — roughly an order of magnitude lighter than a statistical forecast.
helm repo add promanomaly https://esops-dev.github.io/promanomaly
helm install promanomaly promanomaly/promanomaly-stackThis installs VictoriaMetrics, the detector, and example dashboards.
For an existing TSDB, install only promanomaly/promanomaly.
For a complete day-one starter next to node-exporter, point the detector
at examples/configs/node-exporter.yaml —
a ready-made multi-group config watching the host golden signals (CPU,
memory, filesystem fill + fill-rate, disk I/O, network errors, load) with
detectors already matched to each signal's shape. Sibling starters cover
kube-state-metrics,
cAdvisor,
kubelet, and
application/middleware signals, and
the docs/cookbook/ recipes explain which detector fits
which signal. Once running, promanomaly top --target <url> ranks what is
anomalous right now by a normalised severity signal — see
docs/triage.md.
Editing the YAML config does not require a pod restart: the detector
watches the mounted ConfigMap, validates the new config in-place, and
rolls back on validation failure (kubectl apply to the ConfigMap is
enough). POST /-/reload and SIGHUP trigger the same path. Run
promanomaly validate --config path/to/config.yaml in CI to catch
schema errors before merging; add --probe --strict to additionally
execute every query against the live datasource and fail the build
on empty or erroring queries. See docs/operations.md
for the full CLI surface and operational guidance.
See docs/ for configuration, detector selection, and
operational guidance — including docs/patterns.md,
the composition cookbook of how anomaly scores plug into the rest of
the Prometheus ecosystem (rate-of-change, group rollups, multi-window
monitoring, deploy-time silencing, cross-tool joins with promforecast).
docs/when-to-use.md walks through the
"promforecast, promanomaly, or both?" decision tree, and
docs/config-schema.md documents the v1
schema stability commitments. Before going to production, work through
the docs/production-checklist.md
sizing guide and readiness checklist, and keep the
degraded-mode playbook handy for
on-call. Running many clusters? The
multi-cluster reference architectures
spell out the three topologies and their cardinality / network /
blast-radius trade-offs.
A self-monitoring example config is bundled at
examples/configs/self-monitoring.yaml,
and a starter PrometheusRule at
examples/alerts/promanomaly-rules.yaml.
The full multi-group production reference deployment — values, NetworkPolicy
egress, ArgoCD Application, and Flux HelmRepository/HelmRelease —
lives at examples/production/.
Anomaly scores are most useful when the rest of your stack can act on them. promanomaly stays the signal provider; the systems that react (autoscalers, rollout controllers, dashboards, long-term storage) stay the decision-makers.
- Push sinks ship each snapshot beyond
/metrics: a Prometheusremote_writesink for long-horizon anomaly history, and a Grafana annotations sink that surfaces firings across every dashboard. Seedocs/sinks.md. - Kubernetes metrics adapter re-serves anomaly signal through the
external/custom metrics APIs so HPA and KEDA can consume it. Opt-in via
charts/promanomaly-metrics-adapter/; seedocs/adapter.md. - Reaction recipes — KEDA, HPA, and Argo Rollouts cohort-gated
canaries, with the autoscaling guard-rails baked in — live at
examples/k8s/. - GitOps config validation — a composite GitHub Action validates your
config on every PR (schema, cardinality/cost, optional live probe), and
promanomaly generate-rulesscaffolds a PrometheusRule from it. Seedocs/gitops.md.
| promforecast | promanomaly | |
|---|---|---|
| Question | "What will this look like in N hours?" | "Is this abnormal right now?" |
| Method | Fit a model on a long lookback, predict ahead. | Compute a baseline on a short rolling window, score the present. |
| Latency | Minutes-to-hours (next refit). | Seconds-to-minutes (next scrape). |
| Best for | Capacity, traffic, KPIs. | Bursty, sparse, step-shifting, or fleet-relative signals. |
Run both for full coverage: forecast-based deviation alerts on capacity metrics, plus change-point and baseline anomaly alerts on error rates, rare events, and per-instance outliers.
The configuration schema is stable as of the first stable release:
apiVersion: promanomaly.io/v1 and the values.schema.json $id
both move from alpha to v1. The previous promanomaly.io/v1alpha1
alias keeps loading as a byte-identical form with a deprecation
warning. See docs/config-schema.md for the
stability commitments, including which changes are breaking and
which are not.
Output metric names and label sets are part of the public API and follow Prometheus conventions strictly (no colons — those are reserved for recording rules).
Found a vulnerability? See SECURITY.md. Please do not open public issues for security reports.
See CONTRIBUTING.md and the docs/ folder for detector authoring.