I read the website's page on performance overhead, which describes how much CPU beyla consumes at various configurations, but it does not talk about how CPU-usage of a process changes when it is instrumented with beyla. It also does not talk about what happens in a kubernetes cluster on modern public clouds.
To get a sense of those numbers, I load tested a simple Go HTTP server in an EKS cluster, and found that using the default configuration of beyla when deployed as a sidecar to that server, the server container was using ~40% more CPU (, went from ~134m to ~192m). beyla itself was only using 4 millicores, but the CPU-usage of my server process shot up this significantly.
Is this is the typically expected overhead on Go servers?
Testing methodology
I wrote a couple of very simple services in Golang:
links: a simple URL shortening service, storing data in postgresql.
todomvc: a backend for a todo-item application, also storing data in postgresql. This calls links to shorten any links present in todo-item content, before storing it in postgres.
To remove any impact of postgresql latency, I then wrote mock drivers for both of the above services that implement functionality in-memory, and sleep for a millisecond or two, depending on the query.
I deployed all of these to an EKS cluster in an isolated node with nothing else running on it, and gave everything a generous CPU limit to prevent throttling.
I used vegeta to load-test the create API of the todomvc service at 500 rps in an open-loop fashion. vegeta itself was run in another node (to reduce interference), with prometheus scraping latency numbers from it. kube-state-metrics was the source for CPU-usage, also queried via prometheus.
Finally, I repeated this load-test with beyla (version 2.8.5) attached with the default configuration, which is the following:
shutdown_timeout: 2s
enforce_sys_caps: true
otel_metrics_export:
endpoint: http://otel-collector:4318/v1/metrics
protocol: http/protobuf
insecure_skip_verify: true
otel_traces_export:
endpoint: http://otel-collector:4318/v1/traces
protocol: http/protobuf
insecure_skip_verify: true
I used BEYLA_OPEN_PORT to make the beyla sidecar instrument only the pod in question. I locked both services to one pod, and initially turned on debug logs, to make sure that each process is only getting instrumented by one beyla process, then removed it from the config file again to prevent logging overhead.
I repeated the test with actual postgresql as well (without the mock drivers), and tried using various values for ebpf.wakeup_len as mentioned in the perf-tuning docs. This barely changed the measured CPU-overhead on the instrumented-process.
Finally, this was on a c7g.2xlarge, which is an ARM-based instance offered by AWS. I repeated the test with a c5.2xlarge, which is x86_64, and this also resulted in a similarly high overhead.
Happy to share more information if that helps figure out what is going on.
Edit: The latency-increase in all configurations I tested, was not even noticeable at the histogram resolution I used.
I read the website's page on performance overhead, which describes how much CPU beyla consumes at various configurations, but it does not talk about how CPU-usage of a process changes when it is instrumented with beyla. It also does not talk about what happens in a kubernetes cluster on modern public clouds.
To get a sense of those numbers, I load tested a simple Go HTTP server in an EKS cluster, and found that using the default configuration of beyla when deployed as a sidecar to that server, the server container was using ~40% more CPU (, went from ~134m to ~192m).
beylaitself was only using 4 millicores, but the CPU-usage of my server process shot up this significantly.Is this is the typically expected overhead on Go servers?
Testing methodology
I wrote a couple of very simple services in Golang:
links: a simple URL shortening service, storing data in postgresql.todomvc: a backend for a todo-item application, also storing data in postgresql. This callslinksto shorten any links present in todo-item content, before storing it in postgres.To remove any impact of postgresql latency, I then wrote mock drivers for both of the above services that implement functionality in-memory, and sleep for a millisecond or two, depending on the query.
I deployed all of these to an EKS cluster in an isolated node with nothing else running on it, and gave everything a generous CPU limit to prevent throttling.
I used vegeta to load-test the create API of the
todomvcservice at 500 rps in an open-loop fashion.vegetaitself was run in another node (to reduce interference), with prometheus scraping latency numbers from it. kube-state-metrics was the source for CPU-usage, also queried via prometheus.Finally, I repeated this load-test with beyla (version
2.8.5) attached with the default configuration, which is the following:I used
BEYLA_OPEN_PORTto make the beyla sidecar instrument only the pod in question. I locked both services to one pod, and initially turned on debug logs, to make sure that each process is only getting instrumented by one beyla process, then removed it from the config file again to prevent logging overhead.I repeated the test with actual postgresql as well (without the mock drivers), and tried using various values for
ebpf.wakeup_lenas mentioned in the perf-tuning docs. This barely changed the measured CPU-overhead on the instrumented-process.Finally, this was on a
c7g.2xlarge, which is an ARM-based instance offered by AWS. I repeated the test with ac5.2xlarge, which is x86_64, and this also resulted in a similarly high overhead.Happy to share more information if that helps figure out what is going on.
Edit: The latency-increase in all configurations I tested, was not even noticeable at the histogram resolution I used.