feat: Add support for reading OTel ProcessContext from profiled process memory#1181
Conversation
fa819a1 to
bc31d04
Compare
ivoanjo
left a comment
There was a problem hiding this comment.
Did a quick pass! Exciting to see this here!
0330e99 to
4da59ec
Compare
f2605a3 to
83f7642
Compare
…ss memory Implement reading of OpenTelemetry ProcessContext (OTEP #4719) from profiled processes via shared memory regions. The SDK publishes resource attributes into a named memory mapping ([anon:OTEL_CTX] or /memfd:OTEL_CTX) with a 32-byte header containing signature, version, payload size, timestamp, and a pointer to a protobuf-encoded payload. Changes: - Define ProcessContext protobuf message (resource + extra_attributes) - Add processcontext.go to read and validate the shared memory header, deserialize the protobuf payload, and handle concurrent update retries - Recognize OTEL_CTX mappings in parseMappings and store ProcessContextInfo in ProcessMeta - Integrate ProcessContext reading into processmanager's synchronizeMappings - Add unit tests with mock reader and integration tests using real named anonymous and memfd mappings
83f7642 to
5bb52d7
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5bb52d716f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
christos68k
left a comment
There was a problem hiding this comment.
Did one pass, have a few more things to look at / test
ivoanjo
left a comment
There was a problem hiding this comment.
👍 This looks great! I like that although the spec itself has some complex + subtle details, the implementation ends up looking quite sane and mostly straightforward
rogercoll
left a comment
There was a problem hiding this comment.
Overall lgtm, I will do another review once the ProcessContextInfo is forwarded down the pipeline
| ContextMappingAnonNamed = "[anon:OTEL_CTX]" | ||
|
|
||
| // Default maximum retries for concurrent updates | ||
| DefaultMaxRetries = 3 |
There was a problem hiding this comment.
Does this need to be exported?
There was a problem hiding this comment.
Read() takes maxRetries as a parameter (per @christos68k 's suggestion), so this provides a default value for external callers (e.g. processmanager/processinfo.go). Alternatively, I could use the zero value to indicate "use the default" and keep the constant unexported. What do you prefer?
There was a problem hiding this comment.
IMO since no caller needs a different value, just drop the parameter and use a non exported constant internally (e.g defaultMaxRetries). If we need/expect other callers to set a different value for the retries, wdyt if we use the Options pattern instead? (or we could add the Options later as being non api breaking) E.g:
type options struct {
maxRetries int
}
type Option func(*options)
func WithMaxRetries(n int) Option {
return func(o *options) { o.maxRetries = n }
}
func Read(addr libpf.Address, rm remotememory.RemoteMemory, lastPublishedAtNs
uint64, opts ...Option) (Info, error) {cc @christos68k
There was a problem hiding this comment.
I'd go for the simpler option of using 0 as default, allowing for caller overrides, unless we know we'll have more than one options in the future.
There was a problem hiding this comment.
Just noticed that I called the parameter maxRetries but it is implemented in fact as maxAttempts.
I renamed it, and when passed 0, it uses the default number of read attempts.
As open-telemetry/opentelemetry-specification#4719 looks to be merged soon, it came up as we were implementing this spec in the OTel eBPF Profiler (open-telemetry/opentelemetry-ebpf-profiler#1181) that it'd be nice to stop copy-pasting the `process_context.proto` and it's time to add it to the proper place. This is my first contribution to this repo so please do point out if I missed something! I didn't touch the collector parts since this message is not expected to be processed by the collector directly.
|
(I meant to approve on my previous pass but pressed comment instead of approve) |
Co-authored-by: Roger Coll <roger.coll@elastic.co>
4e747eb to
817dfc7
Compare
rogercoll
left a comment
There was a problem hiding this comment.
Overall the code LGTM, but the ProcessContextInfo it's never consumed downstream. This makes the processcontext feature effectively a no-op in the current changeset. Is there a follow-up PR planned to wire ProcessContextInfo through the tracer, reporter and pdata path?
Indeed this PR addresses only detecting/reading process context. I have a follow-up PR planned once this PR is merged (here in draft state). |
Hi @christos68k, some of the blockers you mentioned should now be resolved:
Let me know if there's anything else needed from my side to move this forward ! |
| import ( | ||
| "encoding/binary" | ||
| "errors" | ||
| "fmt" |
There was a problem hiding this comment.
| "fmt" | |
| "fmt" | |
| "regexp" |
Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>
Rename the parameter to reflect that it controls total attempts, not retries. Unexport the constant and treat 0 as "use default" so callers don't need to reference an internal tuning knob.
…fd paths IsContextMapping now takes an isExecutable parameter to reject executable mappings. Also adds explicit handling for "/memfd:OTEL_CTX (deleted)" paths instead of relying on iterateMappings to trim the suffix.
There was a problem hiding this comment.
LGTM
We'll push for open-telemetry/opentelemetry-proto#783 merge today (so I won't merge this right now PR still needs one more approval from an approver/maintainer).
florianl
left a comment
There was a problem hiding this comment.
It seems like we don't get open-telemetry/opentelemetry-proto#783 merged soonish. So I have just minor nits left and I think we should merge and continue from here.
Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
Implement reading of OpenTelemetry ProcessContext (OTEP #4719) from profiled processes via shared memory regions. The SDK publishes resource attributes into a named memory mapping ([anon:OTEL_CTX] or /memfd:OTEL_CTX) with a 32-byte header containing signature, version, payload size, timestamp, and a pointer to a protobuf-encoded payload.
Changes:
This PR only addresses reading process context but does yet propagate it to profile.
The goal is to gather early feedback.
Implementation notes:
synchronizeMappings. If process context is published later on, then it might be missed if no other call tosynchronizeMappingsoccurs. A possibility would be to hook prctl calls to trigger process context reads.Remaining work: