ebpf: support Go cgo stack unwinding through goroutine stack by alban · Pull Request #1331 · open-telemetry/opentelemetry-ebpf-profiler

alban · 2026-04-07T09:43:46Z

Problem

When profiling Go processes that use cgo, stack unwinding fails after
runtime.asmcgocall. The return address points to the Go goroutine stack
(anonymous mmap at 0xc000xxxxxx), which is not tracked in
pid_page_to_mapping_info. This causes ERR_NATIVE_NO_PID_PAGE_MAPPING
and truncates the stack trace — all Go frames above the cgo boundary are lost.

Solution

Add a fallback in get_next_unwinder_after_native_frame(): when
resolve_unwind_mapping returns ERR_NATIVE_NO_PID_PAGE_MAPPING, check
if the process is a known Go process (via the existing go_labels_procs
map) and try frame-pointer-based unwinding to traverse the goroutine
stack back to the Go binary text section.

Also add go_labels_procs map support in the coredump test harness
(previously hardcoded to return NULL), and a Go+cgo coredump test case.

Based on a patch by @burak-ok (burak-ok@fc2edfaf).

Test results

Coredump-based tests (go test -a ./tools/coredump/):

Test case	Without fix	With fix
go-1.24.1-hello (pure Go)	✅ PASS	✅ PASS
go-cgo-test (Go+cgo)	❌ FAIL	✅ PASS

Without fix, the main thread stack ends at:

runtime.asmcgocall+0
<unwinding aborted due to error native_no_pid_page_mapping>

With fix, the full cgo call chain is resolved:

libc.so.6 (sleep) → runtime.asmcgocall → runtime.cgocall →
main._Cfunc_c_outer → main.goCallC → main.goMiddle →
main.goOuter → main.main → runtime.main → runtime.goexit

Details

Without the fix:

=== RUN   TestCoreDumps
=== RUN   TestCoreDumps/testdata/amd64/go-1.24.1-hello.json.json
time=2026-04-07T11:33:30.213+02:00 level=INFO msg="Interpreter tracers: perl,php,python,hotspot,ruby,v8,dotnet,go,labels,beam"
time=2026-04-07T11:33:30.232+02:00 level=WARN msg="Store does not bundle linux-vdso.1.so"
=== RUN   TestCoreDumps/testdata/amd64/go-cgo-test.json
time=2026-04-07T11:33:30.237+02:00 level=INFO msg="Interpreter tracers: perl,php,python,hotspot,ruby,v8,dotnet,go,labels,beam"
time=2026-04-07T11:33:30.244+02:00 level=WARN msg="Store does not bundle linux-vdso.1.so"
time=2026-04-07T11:33:30.258+02:00 level=WARN msg="Store does not bundle /usr/lib/debug/usr/lib64/libc.so.6-2.37-19.fc38.x86_64.debug"
    coredump_test.go:40: 
        	Error Trace:	/home/alban/go/src/github.com/open-telemetry/opentelemetry-ebpf-profiler/tools/coredump/coredump_test.go:40
        	Error:      	Not equal: 
...
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -3,3 +3,3 @@
        	            	   LWP: (uint32) 615559,
        	            	-  Frames: ([]string) (len=12) {
        	            	+  Frames: ([]string) (len=5) {
        	            	    (string) (len=17) "libc.so.6+0xd6413",
        	            	@@ -8,10 +8,3 @@
        	            	    (string) (len=82) "runtime.asmcgocall+0 in /home/alban/programs/golang/go/src/runtime/asm_amd64.s:925",
        	            	-   (string) (len=78) "runtime.cgocall+0 in /home/alban/programs/golang/go/src/runtime/cgocall.go:185",
        	            	-   (string) (len=43) "main._Cfunc_c_outer+0 in _cgo_gotypes.go:49",
        	            	-   (string) (len=135) "main.goCallC+0 in /home/alban/go/src/github.com/open-telemetry/opentelemetry-ebpf-profiler/tools/coredump/testsources/go-cgo/main.go:26",
        	            	-   (string) (len=136) "main.goMiddle+0 in /home/alban/go/src/github.com/open-telemetry/opentelemetry-ebpf-profiler/tools/coredump/testsources/go-cgo/main.go:32",
        	            	-   (string) (len=135) "main.goOuter+0 in /home/alban/go/src/github.com/open-telemetry/opentelemetry-ebpf-profiler/tools/coredump/testsources/go-cgo/main.go:37",
        	            	-   (string) (len=132) "main.main+0 in /home/alban/go/src/github.com/open-telemetry/opentelemetry-ebpf-profiler/tools/coredump/testsources/go-cgo/main.go:41",
        	            	-   (string) (len=89) "runtime.main+0 in /home/alban/programs/golang/go/src/internal/runtime/atomic/types.go:194",
        	            	-   (string) (len=79) "runtime.goexit+0 in /home/alban/programs/golang/go/src/runtime/asm_amd64.s:1694"
        	            	+   (string) (len=59) "<unwinding aborted due to error native_no_pid_page_mapping>"
        	            	   }
        	Test:       	TestCoreDumps/testdata/amd64/go-cgo-test.json
--- FAIL: TestCoreDumps (0.07s)
    --- PASS: TestCoreDumps/testdata/amd64/go-1.24.1-hello.json.json (0.02s)
    --- FAIL: TestCoreDumps/testdata/amd64/go-cgo-test.json (0.04s)
FAIL
FAIL	go.opentelemetry.io/ebpf-profiler/tools/coredump	0.075s
FAIL

With the fix:

$ go test -a -v -run 'TestCoreDumps/testdata/amd64/go' ./tools/coredump/
=== RUN   TestCoreDumps
=== RUN   TestCoreDumps/testdata/amd64/go-1.24.1-hello.json.json
time=2026-04-07T11:31:38.141+02:00 level=INFO msg="Interpreter tracers: perl,php,python,hotspot,ruby,v8,dotnet,go,labels,beam"
time=2026-04-07T11:31:38.173+02:00 level=WARN msg="Store does not bundle linux-vdso.1.so"
=== RUN   TestCoreDumps/testdata/amd64/go-cgo-test.json
time=2026-04-07T11:31:38.175+02:00 level=INFO msg="Interpreter tracers: perl,php,python,hotspot,ruby,v8,dotnet,go,labels,beam"
time=2026-04-07T11:31:38.178+02:00 level=WARN msg="Store does not bundle linux-vdso.1.so"
time=2026-04-07T11:31:38.191+02:00 level=WARN msg="Store does not bundle /usr/lib/debug/usr/lib64/libc.so.6-2.37-19.fc38.x86_64.debug"
--- PASS: TestCoreDumps (0.06s)
    --- PASS: TestCoreDumps/testdata/amd64/go-1.24.1-hello.json.json (0.03s)
    --- PASS: TestCoreDumps/testdata/amd64/go-cgo-test.json (0.03s)
PASS
ok  	go.opentelemetry.io/ebpf-profiler/tools/coredump	0.073s

Note for reviewers

The coredump test case modules need to be uploaded to the module store
before CI can run the go-cgo-test. Please help with ./coredump upload -all
if you have write access to the OCI bucket.

After unwinding through runtime.asmcgocall in Go+cgo processes, the return address may point to the goroutine stack (anonymous mmap at 0xc000xxxxxx) which is not tracked in pid_page_to_mapping_info. This causes ERR_NATIVE_NO_PID_PAGE_MAPPING and truncates the stack trace, losing the Go frames above the cgo call. Add a fallback in get_next_unwinder_after_native_frame() that detects known Go processes (via the existing go_labels_procs map) and uses frame-pointer-based unwinding to traverse the goroutine stack back to the Go binary text section. This enables full Go stack trace resolution through cgo: runtime.asmcgocall -> runtime.cgocall -> main._Cfunc_... -> Go caller frames -> runtime.main -> runtime.goexit Based-on-patch-by: Burak Ok <burakok@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add support for the go_labels_procs BPF map in the coredump test harness, enabling the Go cgo stack unwinding fallback to work in tests. Previously, go_labels_procs always returned NULL, preventing the frame-pointer unwinding fallback from triggering. Add a Go+cgo coredump test case that exercises the full cgo stack: libc (sleep) -> runtime.asmcgocall -> runtime.cgocall -> main._Cfunc_c_outer -> main.goCallC -> main.goMiddle -> main.goOuter -> main.main -> runtime.main -> runtime.goexit Test results with coredump-based tests (go test -a ./tools/coredump/): Test 1: go-1.24.1-hello WITHOUT cgo fix → PASS (no regression on pure Go) Test 2: go-cgo-test WITHOUT cgo fix → FAIL Main thread stack ends with: runtime.asmcgocall+0 in asm_amd64.s:925 <unwinding aborted due to error native_no_pid_page_mapping> Test 3: go-1.24.1-hello WITH cgo fix → PASS (no regression on pure Go) Test 4: go-cgo-test WITH cgo fix → PASS Main thread stack now shows the full cgo call chain: libc.so.6 (sleep) -> runtime.asmcgocall -> runtime.cgocall -> main._Cfunc_c_outer -> main.goCallC -> main.goMiddle -> main.goOuter -> main.main -> runtime.main -> runtime.goexit Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

christos68k

Thanks! Left some nits, looks good to me.

christos68k · 2026-04-07T17:25:25Z

+        if (state->pc != 0) {
+          error = resolve_unwind_mapping(record, unwinder);
+          if (!error) {
+            DEBUG_PRINT("Go FP unwinding succeeded, new pc=%llx", state->pc);


We can add a goto here to avoid duplicating lines of code

christos68k · 2026-04-07T17:33:18Z

@@ -516,6 +516,31 @@ get_next_unwinder_after_native_frame(PerCPURecord *record, int *unwinder)

  DEBUG_PRINT("==== Resolve next frame unwinder: frame %d ====", record->trace.num_frames);
  ErrorCode error = resolve_unwind_mapping(record, unwinder);


resolve_unwind_mapping can set error_metric when it returns ERR_NATIVE_NO_PID_PAGE_MAPPING, which we should probably clear if we return ERR_OK in this branch.

christos68k · 2026-04-07T17:36:42Z

+    // is not tracked in pid_page_to_mapping_info. Try frame pointer unwinding to get
+    // back to the Go binary's text section.
+    u32 pid = record->trace.pid;
+    if (bpf_map_lookup_elem(&go_labels_procs, &pid)) {


We're paying this cost for every process here, not just Go. Probably not worth worrying about at this point.

christos68k · 2026-04-07T17:43:02Z

@@ -0,0 +1,100 @@
+{


Is it possible to add an additional test case for arm64 ?

fabled · 2026-04-07T18:03:49Z


  DEBUG_PRINT("==== Resolve next frame unwinder: frame %d ====", record->trace.num_frames);
  ErrorCode error = resolve_unwind_mapping(record, unwinder);
+  if (error == ERR_NATIVE_NO_PID_PAGE_MAPPING) {


I think it is better to not do this in ebpf. If frame pointer is valid, the go stack delta extraction should generate the command to use frame pointer. No need to add extra complexity to ebpf here.

See:

opentelemetry-ebpf-profiler/nativeunwind/elfunwindinfo/elfgopclntab.go

Line 36 in 927a75e

"runtime.mcall": &sdtypes.UnwindInfoStop,

This allows adjusting generated command per go function.

@wehzzz works on #1313 unwinding the go stack beyond this point.

@florianl This suggests using frame pointer for runtime.asmcgocall. I wonder if frame pointer guaranteed here or should the new mechanism in #1313 used also for this go function to not depend on frame pointer?

If I understand correctly, #1313 and #1279 fix this in a more generic way and fix more use cases.

Should I close this PR then?

Would the unit test with tools/coredump be reusable with the approach in #1313 and #1279?

I have to check and verify runtime.asmcgocall and more general cgo tests. From tests with #1279 I can say, that the approach in #1313 works reliable - but I also limited my tests and research for Go versions where we use strategyFramePointer.

So the asmcgocall for x86-64 is at https://github.com/golang/go/blob/master/src/runtime/asm_amd64.s#L919 and for arm64 at https://github.com/golang/go/blob/master/src/runtime/asm_arm64.s

Seems that it can be called from Goroutine context and scheduler context - the Goroutine context switches stacks, and the scheduler context does not as its in correct stack already.

The stack pointer recovery is at https://github.com/golang/go/blob/master/src/runtime/asm_amd64.s#L969-L973 and indicates that FP based unwinding would not give right answer. Instead its recovered from the the g.

This is likely due to the fact that the CGO code could call Go code and have the per-g stack moved/resized. I suspect we'd need to do the same here.

Probably needs similar custom command as the other go stack switching primitives being implemented in the other PR.

From my understanding of the amd64 assembly, we could create a new unwind command for asmcgocall. It would be very similar to the existing systemstack one, since both functions share the same mechanism: they call gosave_systemstack_switch to save the goroutine context into g.sched before switching to the g0 stack. The gobuf layout is identical, so the recovery logic (read sched.sp, dereference to get PC/FP/SP) can be reused.

Both functions also have a "no-switch" path when already running on the system stack (noswitch for systemstack, nosave for asmcgocall). The key difference is how these paths call the target function:

systemstack uses a JMP (tail call) in its noswitch path - this removes systemstack from the call stack entirely, so the unwinder never encounters a return address pointing into it. The unwind command never fires in this case, which is the correct behavior.

asmcgocall uses a CALL in its nosave path - this leaves a return address on the stack pointing back into asmcgocall. During unwinding, the profiler would see this frame and apply the unwind command, which would attempt to read gobuf - but gosave_systemstack_switch was never called in this path, so gobuf may contain stale data from a previous call.

To handle both paths correctly, we could potentially do the following (need to dig more in depth into it):

switch path: cross to the goroutine stack using the gobuf recovery logic, same as systemstack.

nosave path: fall back to normal frame-pointer unwinding. Since everything stays on the same stack (g0) in the nosave path, the FP chain is intact and regular unwinding gives a correct trace.

See also #1313 (comment)

Didn't realize that earlier, but apparently the linker injects the frame pointer. Perhaps it applies only if frame pointer is enabled build time.

So probably frame pointer works assuming new enough Go and frame pointers enabled.

For universal solution, this should use the system stack approach of having similar (or the same) unwind command that resolves the old stack from G.

alban requested review from a team as code owners April 7, 2026 09:43

alban and others added 2 commits April 7, 2026 14:15

alban force-pushed the alban_go-cgo-stack-unwinding branch from 72be576 to c389231 Compare April 7, 2026 12:17

christos68k reviewed Apr 7, 2026

View reviewed changes

fabled reviewed Apr 7, 2026

View reviewed changes

alban mentioned this pull request Apr 8, 2026

ustack: Add OTel eBPF Profiler symbolizer for interpreted language stacks inspektor-gadget/inspektor-gadget#4925

Open

		@@ -516,6 +516,31 @@ get_next_unwinder_after_native_frame(PerCPURecord record, int unwinder)

		DEBUG_PRINT("==== Resolve next frame unwinder: frame %d ====", record->trace.num_frames);
		ErrorCode error = resolve_unwind_mapping(record, unwinder);

Conversation

alban commented Apr 7, 2026

Problem

Solution

Test results

Note for reviewers

Uh oh!

christos68k left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabled Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fabled Apr 7, 2026 •

edited

Loading