Skip to content

DescartesResearch/graphobs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

graphobs: Navigate your Observability Data with Graph Queries

graphobs is a plugin for the Neo4j graph database. It bridges graph queries and observability backends by using a property graph that represents service architecture, deployment topology, and infrastructure as the primary abstraction for selecting, navigating, and analyzing observability data instead of label-based selectors. Time-series, trace, and log data remain stored in their respective established backends and can be queried and analyzed through Cypher queries (Neo4j's query language) and graphobs procedures. This also enables the application of graph algorithms from Neo4j's Graph Data Science library to observability data. Currently, Jaeger, Prometheus, and OpenSearch are supported as backends.

graphobs architecture

graphobs provides its functionality through Neo4j user-defined procedures. Procedures are custom operations that extend the Cypher query language and are invoked using the CALL keyword. Since procedures integrate seamlessly into Cypher queries, they can be freely combined with graph pattern matching, filtering, and aggregation. The provided procedures fall into the following categories:

  1. Registration: Procedures for connecting and configuring backends
  2. Data Retrieval: Querying time series, traces, and logs from external backends
  3. Analysis: Operations such as time-series correlation analysis, comparison, change-point detection, and outlier detection
  4. Trace-graph Bridging: Mediate between trace representations and the graph model
  5. Search: Procedures for temporal and topological queries that arise frequently in observability analysis, such as retrieving all time-associated nodes active during a given interval

Detailed documentation of all 76 procedures is available here.

Example: Request Rate for all Frontend Operations

In the following query, a MATCH clause selects nodes by graph topology, in this example, all operations of the frontend service. Then, for each operation, a graphobs procedure retrieves the call count (stored in Prometheus) as a differenced time series (i.e., the request rate) over the last 30 minutes.

MATCH (:Service {name: "frontend"})-[:HAS_OPERATION]->(op:Operation)
CALL graphobs.data.get_time_series(op, "traces_span_metrics_calls_total", {range: "-30m", aggregation: "difference"})
YIELD timestamps, values, source
RETURN op.name AS operation, timestamps, values

Quick Start with graphobs and the OpenTelemetry Demo Application

The following guide shows how to try out graphobs in combination with the OpenTelemetry Demo, a microservice-based application that generates realistic telemetry data.

Prerequisites

  • Running Kubernetes cluster
  • Helm 3.x
  • kubectl configured

Installation

1. Clone this repository

git clone https://github.com/DescartesResearch/graphobs.git
cd graphobs

2. Add OpenTelemetry Demo Helm Repository

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

3. Deploy OpenTelemetry Demo

kubectl apply -f kubernetes/grafana-neo4j-datasource.yaml
kubectl apply -f kubernetes/grafana-dashboard-configmap.yaml
helm install otel-demo open-telemetry/opentelemetry-demo -f kubernetes/values-grafana-neo4j.yaml

4. Deploy graphobs

kubectl apply -f kubernetes/graphobs.yaml

Wait for all pods to be ready:

kubectl get pods -w

5. Graph Construction

graphobs builds its graph model through registration procedures. Each procedure takes the URL of the respective observability backend, connects to it, and automatically discovers the available entities. Replace the URLs in the following examples with the URLs from your deployment if needed.

Jaeger Import services, operations, and their dependencies:

CALL graphobs.jaeger.import_graph("http://jaeger-query:16686")

Prometheus Discover pods, servers, and their time-series associations:

CALL graphobs.datasources.register_prometheus("http://prometheus-server:9090", {})

OpenSearch Discover log indices:

CALL graphobs.datasources.register_opensearch("http://opensearch:9200", {user: "admin", password: "admin"})

Note that when graphobs is deployed as a Kubernetes pod, it automatically calls the Jaeger and Prometheus registration procedures on startup using the environment variables configured in the deployment file.

The resulting graph can then be explored in the Neo4j Browser and queried with Cypher.

Access

1. Port Forwarding

# Neo4j Browser
kubectl port-forward svc/graphobs 7474:7474 7687:7687

# Grafana
kubectl port-forward svc/otel-demo-grafana 3000:80

This assumes that the forwarded ports are accessed on the same machine where kubectl is executed. Otherwise, the port-forward commands need to be adjusted accordingly.

2. Open the Neo4j Browser

Building the graph from the observability data may take a few minutes. If something fails, simply delete the pod and wait for it to be automatically recreated.

Example: Root Cause Analysis with graphobs

When latency increases in a front-end service, a downstream service is often the root cause, the delay propagates along the call graph. The following query localizes the source by combining graph traversal with time-series correlation analysis:

  1. Find all transitive downstream services of the frontend via DEPENDS_ON relationships
  2. Correlate the tail latency (p99) of each downstream service with the frontend latency
  3. Weight candidates by their graph distance (number of hops) from the frontend, since the root cause is the first service in the call chain to exhibit the latency increase
MATCH path = (root:Service {name: 'frontend'})-[:HAS_OPERATION]->()-[
        :DEPENDS_ON*]->(op)-[:HAS_OPERATION]-(dependency:Service)
WHERE root <> dependency
WITH root, collect(DISTINCT dependency) AS dependencies

CALL graphobs.analysis.node_group_correlation(root,
    "traces_span_metrics_duration_milliseconds_bucket",
    dependencies,
    "traces_span_metrics_duration_milliseconds_bucket",
    {
        range: "-40m",
        percentile: 0.99,
        resolution: "2m"
    }
) YIELD node, correlation WHERE correlation > 0.1

MATCH distPath = shortestPath((root)-[:HAS_OPERATION|DEPENDS_ON*]-(node))
WITH node, correlation, length(distPath) AS hops
WITH node, correlation, hops, (correlation * hops) AS rc_score

RETURN node.name AS service, correlation, hops, rc_score
ORDER BY rc_score DESC

The resulting rc_score (correlation x hops) ranks services by their likelihood of being the root cause — the top-ranked service is the first candidate to investigate.

Uninstall

kubectl delete -f kubernetes/graphobs.yaml
kubectl delete -f kubernetes/grafana-dashboard-configmap.yaml
kubectl delete -f kubernetes/grafana-neo4j-datasource.yaml
helm uninstall otel-demo

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors