Skip to content

kallsyms: update bpf addresses without full /proc/kallsyms reload#1198

Open
bobrik wants to merge 11 commits intoopen-telemetry:mainfrom
bobrik:ivan/bpf-updates
Open

kallsyms: update bpf addresses without full /proc/kallsyms reload#1198
bobrik wants to merge 11 commits intoopen-telemetry:mainfrom
bobrik:ivan/bpf-updates

Conversation

@bobrik
Copy link
Copy Markdown
Contributor

@bobrik bobrik commented Feb 21, 2026

BPF programs come and go much more frequently than modules and doing a full re-parsing of /proc/kallsyms is very expensive, comparatively speaking. Here we subscribe to updates for both additions and removals of bpf symbols through PERF_RECORD_KSYMBOL mechanism of perf events. Instead of triggering full parsing, we update the pre-existing mapping for bpf pseudo-module whenever possible.

See: #1151.

@bobrik bobrik requested review from a team as code owners February 21, 2026 06:44
Comment thread go.mod Outdated
@fabled
Copy link
Copy Markdown
Contributor

fabled commented Feb 21, 2026

Could you refactor the bpf symbolizer to be a separate package?

It has nothing to do with kallsyms, and I am really hoping the kallsyms package does not get entangled with any bpf machinery.

So best is the bpf symbols stuff is separate package and the tracer uses it in parallel with kallsyms package.

Comment thread support/ebpf/kallsyms.ebpf.c Outdated
@bobrik
Copy link
Copy Markdown
Contributor Author

bobrik commented Feb 24, 2026

Could you refactor the bpf symbolizer to be a separate package?

What's the rough outline of how you see this working? Currently bpf code depends on /proc/kallsyms to provide the baseline mapping, which is updated by perf events in place.

@bobrik bobrik force-pushed the ivan/bpf-updates branch 3 times, most recently from 2d0cfda to 5256651 Compare February 24, 2026 22:37
Comment thread kallsyms/bpf.go Fixed
Comment thread kallsyms/bpf.go Fixed
Comment thread kallsyms/kallsyms.go
} else {
oldMod = nil
}
oldMod, _ = getModuleByAddress(modules, libpf.Address(address))

Check failure

Code scanning / CodeQL

Incorrect conversion between integer types High

Incorrect conversion of an unsigned 64-bit integer from
strconv.ParseUint
to a lower bit size type uintptr without an upper bound check.
@bobrik
Copy link
Copy Markdown
Contributor Author

bobrik commented Feb 24, 2026

I updated the code to separate it a bit from kallsyms, but it's still in the same package. It now also addresses #1199 for bpf symbols.

Production testing shows a nice drop in CPU usage (red line machine has the new code):

image

Flamegraph comparison shows kallsyms parsing going poof (it is also a lot smoother):

image

Comment thread kallsyms/bpf.go Fixed
Comment thread tracer/tracer.go Outdated
Comment thread kallsyms/kallsyms_test.go
Comment thread kallsyms/bpf.go
Comment thread kallsyms/bpf.go Outdated
Comment thread kallsyms/bpf.go
for _, cpu := range onlineCPUs {
event, err := perf.Open(attr, -1, cpu, nil)
if err != nil {
return err
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we should close existing events in s.events if we run into this issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do run Close() on error, which I think is the right way to cleanup.

Comment thread kallsyms/bpf.go
continue
}

switch ksymbol := record.(type) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is a <-ctx.Done() case in every case statement, should we have this check maybe before switch ksymbol ... instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they it will not be in the same select, so it wouldn't be able to break out of a blocked send.

Maybe I misunderstand what you're suggesting.

@fabled
Copy link
Copy Markdown
Contributor

fabled commented Feb 25, 2026

Could you refactor the bpf symbolizer to be a separate package?

What's the rough outline of how you see this working? Currently bpf code depends on /proc/kallsyms to provide the baseline mapping, which is updated by perf events in place.

So the perf events are superior because they return the JITted program code length. This allows you to create mapping of [start-address,stop-address] for each symbol. The length of a symbol is not directly available from kallsyms, so if possible it should not be used as baseline.

Perhaps, the baseline can be established with:

  • bpf(BPF_PROG_GET_NEXT_ID, ...)
  • fd = bpf(BPF_PROG_GET_FD_BY_ID, ...)
  • bpf(BPF_OBJ_GET_INFO_BY_FD, ...)

and then inspecting the program info data. I believe jited_ksyms in the info struct contains the kernel address and jited_func_lens the corresponding length.

The kallsyms could just completely ignore bpf, and in fact stop reading the kallsyms when bpf is seen (as you report it being really slow).

If both the start/end is collected in baseline and from the perf symbol updates, you can just create independent symbolizer and accurately match the symbols.

Also, since the tracer in startPerfEventMonitor already opens the event channel for all CPUs, could those same event pipes be used to get the symbol updates? This would reduce some resource overhead if a separate set is opened. This would mean the bpf symbolizer would need internal methods to use the bpf syscall to establish baseline when needed, and then rely on events being transported via a method called by the tracer package.

Would this sound feasible approach to you?

@bobrik
Copy link
Copy Markdown
Contributor Author

bobrik commented Feb 28, 2026

So the perf events are superior because they return the JITted program code length. This allows you to create mapping of [start-address,stop-address] for each symbol.

Establishing the baseline as you suggest would be a lot more expensive than just going through /proc/kallsyms. It is a one time set up, so maybe that's not a huge problem.

In practice, on modern kernels bpf symbols are in a contiguous block, but there's no guarantee that it will stay that way.

The kallsyms could just completely ignore bpf, and in fact stop reading the kallsyms when bpf is seen (as you report it being really slow).

It's a one time thing, so I think it's fine to read and skip bpf rather than just stop. I don't think there's any promise that no non-bpf symbols will appear after bpf.

Also, since the tracer in startPerfEventMonitor already opens the event channel for all CPUs, could those same event pipes be used to get the symbol updates?

probabilisticProfile disables these events and I don't think doing full re-initialization is a good tradeoff vs having separate events for bpf that are constantly open, especially if we make initialization more expensive.

Would this sound feasible approach to you?

I would probably move that effort in a follow-up PR, unless you feel strongly about it.

It would be good to address the existing slowness and #1199 here first.

Comment thread kallsyms/bpf.go Outdated
Comment thread kallsyms/bpf.go Fixed
@bobrik bobrik force-pushed the ivan/bpf-updates branch 2 times, most recently from 711b29e to 422fdb9 Compare March 19, 2026 00:26
Copy link
Copy Markdown
Contributor Author

@bobrik bobrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a bunch of commits to resolve issues and added an integration test.

From my end it should be ready to go. I'm not really sure what to do with codeql complaints.

I can squash into one commit once it's approved.

Comment thread kallsyms/bpf.go
for _, cpu := range onlineCPUs {
event, err := perf.Open(attr, -1, cpu, nil)
if err != nil {
return err
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do run Close() on error, which I think is the right way to cleanup.

Comment thread kallsyms/bpf.go
continue
}

switch ksymbol := record.(type) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they it will not be in the same select, so it wouldn't be able to break out of a blocked send.

Maybe I misunderstand what you're suggesting.

Comment thread kallsyms/kallsyms_test.go
Comment thread kallsyms/bpf.go
@bobrik
Copy link
Copy Markdown
Contributor Author

bobrik commented Mar 19, 2026

Linux v5.4 is giving me a hard time again ☹️

@fabled
Copy link
Copy Markdown
Contributor

fabled commented Mar 19, 2026

So the perf events are superior because they return the JITted program code length. This allows you to create mapping of [start-address,stop-address] for each symbol.

Establishing the baseline as you suggest would be a lot more expensive than just going through /proc/kallsyms. It is a one time set up, so maybe that's not a huge problem.

More expensive in what sense? The problem you are solving that reading kallsyms is really slow. And now you argue its better to read it instead of using dedicated fast API?

Yes, its a bit more code and syscalls. But I think it will be much more efficient in CPU usage. Also, you get the JIT code length data which helps a lot to not incorrectly symbolize random addresses as some bpf symbol.

In practice, on modern kernels bpf symbols are in a contiguous block, but there's no guarantee that it will stay that way.

Right. Which is another reason why collecting and matching with bpf code length will help.

The kallsyms could just completely ignore bpf, and in fact stop reading the kallsyms when bpf is seen (as you report it being really slow).

It's a one time thing, so I think it's fine to read and skip bpf rather than just stop. I don't think there's any promise that no non-bpf symbols will appear after bpf.

Kernel code guarantees that the actual kernel and module symbols come first. After bpf might still come __builtin__kprobes symbols. Though I am not sure if those can be handled currently in any sensible way. I'd probably just ignore those at this time.

Also, since the tracer in startPerfEventMonitor already opens the event channel for all CPUs, could those same event pipes be used to get the symbol updates?

probabilisticProfile disables these events and I don't think doing full re-initialization is a good tradeoff vs having separate events for bpf that are constantly open, especially if we make initialization more expensive.

Fair enough. Lets not mix that in at this time.

Would this sound feasible approach to you?

I would probably move that effort in a follow-up PR, unless you feel strongly about it.

I would really like to not introduce something we want to change again. This applies mostly the initial synchronization.

It would be good to address the existing slowness and #1199 here first.

Fixing #1199 could be a separate more self contained PR.

Also something needs fixing since tests are failing. Are you able to determine and fix the issue?

@bobrik bobrik force-pushed the ivan/bpf-updates branch from 422fdb9 to 2a6279f Compare March 20, 2026 06:21
@bobrik
Copy link
Copy Markdown
Contributor Author

bobrik commented Mar 20, 2026

I updated the code to iterate bpf programs instead of parsing kallsyms for the initial pass.

The tests only fail on v5.4. I'm not sure if it's worth worrying about if we're dropping it in #1178.

@bobrik
Copy link
Copy Markdown
Contributor Author

bobrik commented Mar 20, 2026

Also, you get the JIT code length data which helps a lot to not incorrectly symbolize random addresses as some bpf symbol.

How do you see this happening? The current code assumes that Module is a contiguous chunk of memory and it's the unit that Tracer operates on. Do you expect bpf symbolizer to still operate through that lens? Or should Tracer call bpf symbolizer first and fall back the the kallsyms one if nothing is found, bypassing Module?

@bobrik
Copy link
Copy Markdown
Contributor Author

bobrik commented Mar 21, 2026

I pushed a commit to track the sizes. It is only used to adjust the Module end for now. Without it the previous heuristic of rounding up to the end of the page was in fact missing a longer bpf program.

One consequence of not using /proc/kallsyms here:

ivan@44m298:~$ sudo cat /proc/kallsyms | sort -n | grep bpf_prog_59123752622e719c_cwndset_handle_sockops_event -C3
ffffffffc067b714 t bpf_prog_08847339af9eb700_func       [bpf]
ffffffffc067b7d4 t bpf_prog_24998ac464cda05b_func       [bpf]
ffffffffc067b84c t bpf_prog_b9ffbc83d8aed900_func       [bpf]
ffffffffc067b908 t bpf_prog_59123752622e719c_cwndset_handle_sockops_event       [bpf]
ffffffffc067ba80 t bpf_trampoline_6442536467    [bpf]
ffffffffc067c050 t bpf_prog_10da037330843e17_tc_perf_event      [bpf]
ffffffffc067c508 t bpf_prog_ccff5bcc105a50da_cdtApolloEncap     [bpf]

0xffffffffc067bb42 now resolves to bpf_prog_59123752622e719c_cwndset_handle_sockops_event, despite being firmly after bpf_trampoline_6442536467.

That's because trampolines are not programs, so they are invisible to us now. Yet they appear as frames. We need figure out how to use symbol sizes (#1198 (comment)).

@bobrik
Copy link
Copy Markdown
Contributor Author

bobrik commented Mar 21, 2026

Trampolines are sent via PERF_RECORD_KSYMBOL:

There's just no way to get to them without reading /proc/kallsyms for the initial pass that I can see.

bobrik added 11 commits March 23, 2026 10:13
BPF programs come and go much more frequently than modules and doing
a full re-parsing of `/proc/kallsyms` is very expensive, comparatively
speaking. Here we subscribe to updates for both additions and removals
of bpf symbols through `PERF_RECORD_KSYMBOL` mechanism of perf events.
Instead of triggering full parsing, we update the pre-existing mapping
for bpf pseudo-module whenever possible.
This loosens integration between `kallsyms.go` and `bpf.go`, making
the latter independent and responsible for just `bpf` stuff, while
the former remains responsible for everything else.

There's dual parsing of `/proc/kallsyms` in the initial load, but
it only happens onse, so it's deemded and acceptable cost.

This also fixes the issue with the asumption that `/proc/kallsyms`
for bpf symbols comes in order of addresses, which is not the case.
@github-actions
Copy link
Copy Markdown

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions Bot added the Stale label Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants