Skip to content

POC: Memory profiler allocation labels#62649

Draft
rudolf wants to merge 6 commits intonodejs:mainfrom
rudolf:poc-allocation-profiler-tags-v2
Draft

POC: Memory profiler allocation labels#62649
rudolf wants to merge 6 commits intonodejs:mainfrom
rudolf:poc-allocation-profiler-tags-v2

Conversation

@rudolf
Copy link
Copy Markdown
Contributor

@rudolf rudolf commented Apr 9, 2026

This is a POC for initial feedback. If we can get alignment within Node.js I could try to contribute the v8 changes upstream.

Summary

Adds the ability to tag sampling heap profiler allocations with string labels that propagate through async context (via CPED). This enables attributing memory usage to specific HTTP routes, tenants, or operations — something no JS runtime currently supports.

V8 changes

  • HeapProfileSampleLabelsCallback — embedder callback invoked on sampled allocations to retrieve labels from the current async context
  • AllocationProfile::Sample::labels — key-value pairs on each sample (behind V8_HEAP_PROFILER_SAMPLE_LABELS compile flag)

Datadog's Attila Szegedi proposed a similar label mechanism for CPU profiling on v8-dev (July 2025). V8 team (Leszek Swirski) indicated they would review non-invasive patches behind #ifdefs. This PR applies the same approach to heap profiling, which is simpler. Everything runs on the allocation thread with no signal-safety concerns.

Node.js changes:

  • v8.withHeapProfileLabels(labels, fn) — runs fn with labels that propagate across await
  • v8.setHeapProfileLabels(labels) — sets labels for current async scope (for framework middleware patterns)
  • v8.getAllocationProfile() returns samples[].labels and per-label externalBytes (Buffer/ArrayBuffer)
  • ProfilingArrayBufferAllocator tracks external allocations per label (single atomic load overhead when disabled)

#62273 landed the SyncHeapProfileHandle API with Symbol.dispose support. The labels API proposed here is complementary, it adds context (which route/tenant) to the samples that SyncHeapProfileHandle already collects. A follow-up could integrate withHeapProfileLabels as a method on the handle.

Motivation

In multi-tenant or multi-route Node.js servers, a memory spike today tells you how much memory grew but not what caused it. Operators resort to code inspection or heap snapshots but these don't scale to collecting data over long timespans for large deployments. With labeled heap/external memory profiling, you can answer "route /api/search accounts for 400MB of the 1.2GB heap" directly from production telemetry (e.g. via OTel).

This mirrors Go's pprof.Labels capability

Overhead

20-run benchmark (two-server realistic HTTP workload):

  • Sampling profiler alone: 0.6% (not statistically significant)
  • Sampling + labels: 2.2% total (p<0.01)
  • When disabled: zero overhead (no code path changes)

Test plan

  • V8 cctests for label callback
  • JS tests: label propagation across await, concurrent contexts, setHeapProfileLabels, external memory tracking, GC cleanup
  • Micro and macro benchmarks in benchmark/v8/ and benchmark/http/

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/performance
  • @nodejs/security-wg
  • @nodejs/v8-update

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Apr 9, 2026
src/node_v8.cc Outdated
// This happens when --experimental-async-context-frame is not set on
// Node.js 22, causing all contexts to map to Smi::zero() (address 0).
if (cped.IsEmpty() || cped->IsUndefined()) return;
uintptr_t addr = node::GetLocalAddress(cped);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing in binding data by the CPED address won't work at all. Because all AsyncLocalStorage contexts are combined into a single AsyncContextFrame map, any changes to any contexts will change what this value is, even if the particular store you are interested in has not changed at all within that map frame.

You would need to have V8 capture the CPED value at the time of the sample and store that on the heap profile itself alongside the samples, then use that actual AsyncContextFrame instance to look up what the corresponding data was in that frame for the label store.

rudolf added 6 commits April 10, 2026 11:20
Add a callback mechanism to V8's SamplingHeapProfiler that allows
embedders to attach key-value string labels to allocation samples.

At allocation time, SampleObject() captures the
ContinuationPreservedEmbedderData (CPED) as a Global<Value> on each
internal Sample. At profile-read time, BuildSamples() invokes the
registered HeapProfileSampleLabelsCallback with each sample's stored
CPED, allowing embedders to resolve labels from the async context.

This two-phase approach (capture at allocation, resolve at read) avoids
running embedder callbacks inside DisallowGarbageCollection scopes and
is immune to CPED identity changes caused by unrelated
AsyncLocalStorage stores.

Changes:
- AllocationProfile::Sample gains a `labels` field
- New HeapProfileSampleLabelsCallback typedef on HeapProfiler
- Internal Sample stores `Global<Value> cped` at allocation time
- BuildSamples() invokes labels callback with stored CPED
- Seven cctests covering callback API, GC behavior, unregistration
- features.gypi enables v8_enable_continuation_preserved_embedder_data

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
C++ bindings for V8 heap profile labels with ALS store lookup:
- HeapProfileLabelsCallback reads the ALS store directly from the
  captured CPED (AsyncContextFrame Map) at profile-read time
- Uses Map::AsArray() + linear scan for ALS key lookup, safe inside
  DisallowJavascriptExecution (ArrayBuffer allocator context)
- ProfilingArrayBufferAllocator tracks per-label external memory
  (Buffer/ArrayBuffer) using the same CPED-based label resolution
- SetHeapProfileLabelsStore receives the ALS key from JS at init time
- GetAllocationProfile returns samples with labels and externalBytes
- Cleanup hooks for environment teardown
- Node.js cctests for label registration, callback, and cleanup

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
JS API for heap profile label attribution:
- withHeapProfileLabels(labels, fn): scoped labels via AsyncLocalStorage
  — just ALS.run with pre-flattened label array, zero C++ calls
- setHeapProfileLabels(labels): enterWith semantics for frameworks where
  the handler runs after the middleware returns (e.g., Hapi)
- startSamplingHeapProfiler with includeCollectedObjects option
- stopSamplingHeapProfiler and getAllocationProfile with labels

Labels are pre-flattened to [key, val, key, val, ...] arrays at set
time for GC safety — the C++ callback runs during BuildSamples()
iteration where V8 object allocation could invalidate the iterator.

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
JS parallel tests:
- test-v8-heap-profile-labels: basic labeling, multi-key, JSON
  round-trip, GC retention/removal, includeCollectedObjects flag,
  setHeapProfileLabels leak check, CPED identity test (verifies labels
  survive when another ALS store changes the CPED address)
- test-v8-heap-profile-labels-async: await boundary propagation,
  concurrent contexts, Hapi-style setHeapProfileLabels
- test-v8-heap-profile-external: Buffer/ArrayBuffer per-label
  externalBytes tracking, GC cleanup, unlabeled isolation

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
Document the new v8 module APIs:
- startSamplingHeapProfiler with includeCollectedObjects option
- stopSamplingHeapProfiler
- getAllocationProfile with samples and externalBytes
- withHeapProfileLabels for scoped label attribution
- setHeapProfileLabels for enterWith-style frameworks
- Limitations section covering what is and isn't measured

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
Three benchmarks for measuring heap profile labels overhead:
- v8/heap-profiler-labels: micro benchmark measuring raw overhead
  of withHeapProfileLabels in a tight allocation loop (1M iters)
- http/heap-profiler-labels: single HTTP server with ~150KB mixed
  workload per request
- http/heap-profiler-realistic: two-server architecture (app + DB)
  with JSON parsing, column aggregation, and ALS propagation
  across async I/O boundaries

Each benchmark supports three modes: none, sampling,
sampling-with-labels.

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
@rudolf rudolf force-pushed the poc-allocation-profiler-tags-v2 branch from 8743634 to 302bebe Compare April 10, 2026 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants