kallsyms: update bpf addresses without full /proc/kallsyms reload#1198
kallsyms: update bpf addresses without full /proc/kallsyms reload#1198bobrik wants to merge 11 commits intoopen-telemetry:mainfrom
Conversation
0f92ead to
ce31b1e
Compare
ce31b1e to
bbd29c6
Compare
|
Could you refactor the bpf symbolizer to be a separate package? It has nothing to do with kallsyms, and I am really hoping the kallsyms package does not get entangled with any bpf machinery. So best is the bpf symbols stuff is separate package and the tracer uses it in parallel with kallsyms package. |
What's the rough outline of how you see this working? Currently bpf code depends on |
2d0cfda to
5256651
Compare
| } else { | ||
| oldMod = nil | ||
| } | ||
| oldMod, _ = getModuleByAddress(modules, libpf.Address(address)) |
Check failure
Code scanning / CodeQL
Incorrect conversion between integer types High
|
I updated the code to separate it a bit from kallsyms, but it's still in the same package. It now also addresses #1199 for bpf symbols. Production testing shows a nice drop in CPU usage (red line machine has the new code):
Flamegraph comparison shows kallsyms parsing going poof (it is also a lot smoother):
|
5256651 to
09941d0
Compare
| for _, cpu := range onlineCPUs { | ||
| event, err := perf.Open(attr, -1, cpu, nil) | ||
| if err != nil { | ||
| return err |
There was a problem hiding this comment.
nit: maybe we should close existing events in s.events if we run into this issue.
There was a problem hiding this comment.
We do run Close() on error, which I think is the right way to cleanup.
| continue | ||
| } | ||
|
|
||
| switch ksymbol := record.(type) { |
There was a problem hiding this comment.
As there is a <-ctx.Done() case in every case statement, should we have this check maybe before switch ksymbol ... instead?
There was a problem hiding this comment.
But they it will not be in the same select, so it wouldn't be able to break out of a blocked send.
Maybe I misunderstand what you're suggesting.
So the perf events are superior because they return the JITted program code length. This allows you to create mapping of [start-address,stop-address] for each symbol. The length of a symbol is not directly available from Perhaps, the baseline can be established with:
and then inspecting the program info data. I believe The If both the start/end is collected in baseline and from the perf symbol updates, you can just create independent symbolizer and accurately match the symbols. Also, since the Would this sound feasible approach to you? |
Establishing the baseline as you suggest would be a lot more expensive than just going through In practice, on modern kernels bpf symbols are in a contiguous block, but there's no guarantee that it will stay that way.
It's a one time thing, so I think it's fine to read and skip bpf rather than just stop. I don't think there's any promise that no non-bpf symbols will appear after bpf.
I would probably move that effort in a follow-up PR, unless you feel strongly about it. It would be good to address the existing slowness and #1199 here first. |
09941d0 to
11b0397
Compare
711b29e to
422fdb9
Compare
bobrik
left a comment
There was a problem hiding this comment.
I pushed a bunch of commits to resolve issues and added an integration test.
From my end it should be ready to go. I'm not really sure what to do with codeql complaints.
I can squash into one commit once it's approved.
| for _, cpu := range onlineCPUs { | ||
| event, err := perf.Open(attr, -1, cpu, nil) | ||
| if err != nil { | ||
| return err |
There was a problem hiding this comment.
We do run Close() on error, which I think is the right way to cleanup.
| continue | ||
| } | ||
|
|
||
| switch ksymbol := record.(type) { |
There was a problem hiding this comment.
But they it will not be in the same select, so it wouldn't be able to break out of a blocked send.
Maybe I misunderstand what you're suggesting.
|
Linux v5.4 is giving me a hard time again |
More expensive in what sense? The problem you are solving that reading Yes, its a bit more code and syscalls. But I think it will be much more efficient in CPU usage. Also, you get the JIT code length data which helps a lot to not incorrectly symbolize random addresses as some bpf symbol.
Right. Which is another reason why collecting and matching with bpf code length will help.
Kernel code guarantees that the actual kernel and module symbols come first. After
Fair enough. Lets not mix that in at this time.
I would really like to not introduce something we want to change again. This applies mostly the initial synchronization.
Fixing #1199 could be a separate more self contained PR. Also something needs fixing since tests are failing. Are you able to determine and fix the issue? |
422fdb9 to
2a6279f
Compare
|
I updated the code to iterate bpf programs instead of parsing kallsyms for the initial pass. The tests only fail on v5.4. I'm not sure if it's worth worrying about if we're dropping it in #1178. |
How do you see this happening? The current code assumes that |
|
I pushed a commit to track the sizes. It is only used to adjust the One consequence of not using
That's because trampolines are not programs, so they are invisible to us now. Yet they appear as frames. We need figure out how to use symbol sizes (#1198 (comment)). |
|
Trampolines are sent via There's just no way to get to them without reading |
BPF programs come and go much more frequently than modules and doing a full re-parsing of `/proc/kallsyms` is very expensive, comparatively speaking. Here we subscribe to updates for both additions and removals of bpf symbols through `PERF_RECORD_KSYMBOL` mechanism of perf events. Instead of triggering full parsing, we update the pre-existing mapping for bpf pseudo-module whenever possible.
This loosens integration between `kallsyms.go` and `bpf.go`, making the latter independent and responsible for just `bpf` stuff, while the former remains responsible for everything else. There's dual parsing of `/proc/kallsyms` in the initial load, but it only happens onse, so it's deemded and acceptable cost. This also fixes the issue with the asumption that `/proc/kallsyms` for bpf symbols comes in order of addresses, which is not the case.
5d38c8f to
4ea6555
Compare
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |


BPF programs come and go much more frequently than modules and doing a full re-parsing of
/proc/kallsymsis very expensive, comparatively speaking. Here we subscribe to updates for both additions and removals of bpf symbols throughPERF_RECORD_KSYMBOLmechanism of perf events. Instead of triggering full parsing, we update the pre-existing mapping for bpf pseudo-module whenever possible.See: #1151.