reverse_tunnels: Add EventReporter for reverse tunnel connection reporting#44342
reverse_tunnels: Add EventReporter for reverse tunnel connection reporting#44342aakugan wants to merge 1 commit intoenvoyproxy:mainfrom
Conversation
e36b38c to
66d9466
Compare
66d9466 to
e9ef353
Compare
|
its not immediately clear to me why - but docs build is failing with [97](https://github.com/envoyproxy/envoy/actions/runs/24177797097/job/70563146913#step:19:511)
/tmp/tmpfytzszml/generated/rst/api-v3/config/bootstrap/v3/bootstrap.proto.rst:396: WARNING: undefined label: 'extension_envoy.bootstrap.reverse_tunnel.reverse_tunnel_reporting_service' [ref.ref]
/tmp/tmpfytzszml/generated/rst/intro/arch_overview/security/secpos_requires_trusted_downstream_and_upstream.rst:6: WARNING: undefined label: 'extension_envoy.bootstrap.reverse_tunnel.reverse_tunnel_reporting_service' [ref.ref]
/tmp/tmpfytzszml/generated/rst/intro/arch_overview/security/secpos_requires_trusted_downstream_and_upstream.rst:6: WARNING: undefined label: 'extension_envoy.bootstrap.reverse_tunnel.reverse_tunnel_reporting_service' [ref.ref] |
phlax
left a comment
There was a problem hiding this comment.
some initial docs review - would like to view it rendered - can help if needed to resolve the build problem
| // Configuration for the connection event reporter. | ||
| message EventReporterConfig { | ||
| // Stat prefix for this reporter's metrics. | ||
| // Metrics will be emitted as "{stat_prefix}.events_pushed", etc. |
There was a problem hiding this comment.
| // Metrics will be emitted as "{stat_prefix}.events_pushed", etc. | |
| // Metrics will be emitted as ``{stat_prefix}.events_pushed``, etc. |
| // Represents a single reverse tunnel connection with its metadata. | ||
| message ReverseTunnel { | ||
| // Unique name to identify this tunnel connection. | ||
| // Typically formatted as "{node_id}|{cluster_id}" or similar. |
There was a problem hiding this comment.
| // Typically formatted as "{node_id}|{cluster_id}" or similar. | |
| // Typically formatted as ``{node_id}|{cluster_id}`` or similar. |
| // Detailed information about a reverse tunnel connection. | ||
| message ReverseTunnelInfo { | ||
| // Identity information of the tunnel initiator (downstream Envoy). | ||
| // Contains node_id, cluster_id, and tenant_id for proper identification. |
There was a problem hiding this comment.
| // Contains node_id, cluster_id, and tenant_id for proper identification. | |
| // Contains ``node_id``, ``cluster_id``, and ``tenant_id`` for proper identification. |
75ee2d3 to
0f8c02a
Compare
|
Thanks for taking a look! |
|
/docs |
|
Docs for this Pull Request will be rendered here: https://storage.googleapis.com/envoy-cncf-pr/44342/docs/index.html The docs are (re-)rendered each time the CI |
| void EventReporter::addConnection(std::shared_ptr<ReverseTunnelEvent::Connected>&& connection) { | ||
| ASSERT(context_.mainThreadDispatcher().isThreadSafe()); | ||
|
|
||
| ENVOY_LOG(info, "EventReporter: Accepted a new connection. Node: {}, Cluster: {}, Tenant: {}", |
There was a problem hiding this comment.
Add ReverseTunnel prefix to logs. EventReporter feels pretty generic name
There was a problem hiding this comment.
We get the entire path in the log ig I will remove the prefix 😅
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new contrib extension for reverse tunnel reporting, providing a mechanism for Envoy instances to report connection state changes to a management server. It includes the API definitions for a gRPC-based reporting service, the implementation of an EventReporter that aggregates and de-duplicates connection events, and comprehensive unit tests. Review feedback suggests adding lower-bound validation to the reporting interval to prevent potential resource exhaustion, utilizing try_emplace for more efficient map insertions, and optimizing batch event processing by using move iterators.
0f8c02a to
1f23b5c
Compare
| void EventReporter::reportConnectionEvent(absl::string_view node_id, absl::string_view cluster_id, | ||
| absl::string_view tenant_id) { | ||
| auto ptr = std::make_shared<ReverseTunnelEvent::Connected>( | ||
| ReverseTunnelEvent::Connected{std::string(node_id), std::string(cluster_id), | ||
| std::string(tenant_id), Envoy::SystemTime::clock::now()}); | ||
|
|
||
| context_.mainThreadDispatcher().post( | ||
| [this, ptr = std::move(ptr)]() mutable { this->addConnection(std::move(ptr)); }); | ||
| } | ||
|
|
||
| void EventReporter::reportDisconnectionEvent(absl::string_view node_id, absl::string_view) { | ||
| absl::string_view name = ReverseTunnelEvent::getName(node_id); |
There was a problem hiding this comment.
getName() returns the node_id right? How does this work for tenant scoped nodes?
| using SharedConnections = std::vector<std::shared_ptr<Connected>>; | ||
| using SharedDisconnections = std::vector<std::shared_ptr<Disconnected>>; | ||
|
|
||
| struct BatchedEvents { |
There was a problem hiding this comment.
BatchedEvents sounds misleading because we aren't adding a 'batch', maybe we can call it Events/TunnelUpdates
| } | ||
|
|
||
| void EventReporter::reportDisconnectionEvent(absl::string_view node_id, absl::string_view) { | ||
| absl::string_view name = ReverseTunnelEvent::getName(node_id); |
There was a problem hiding this comment.
getName() returns the node_id right? How does this work for tenant scoped nodes?
There was a problem hiding this comment.
Yea so I kinda expect node_id to be of the format tenant:node in the tenant isolation case.
I have added some comments above the function I will make the changes and add some integ tests to pin it some followup PRs.
| explicit Disconnected(absl::string_view n) : name(n) {} | ||
| }; | ||
|
|
||
| using SharedConnections = std::vector<std::shared_ptr<Connected>>; |
There was a problem hiding this comment.
Nit: SharedConnections -> Connections/ConnectionList
SharedDisconnections -> Disconnections/DisconnectionList
|
|
||
| std::size_t size() const { return connections.size() + disconnections.size(); } | ||
|
|
||
| void operator+=(BatchedEvents&& events) { |
There was a problem hiding this comment.
This does not seem to be used anywhere?
There was a problem hiding this comment.
Oh I needed it for some followup PRs thought of adding it here as it is part of the struct 😅
|
|
||
| virtual void onServerInitialized(ReverseTunnelReporterWithState* reporter) PURE; | ||
|
|
||
| virtual void receiveEvents(ReverseTunnelEvent::BatchedEvents events) PURE; |
There was a problem hiding this comment.
Why not use const ReverseTunnelEvent::BatchedEvents& events here? Does receiveEvents() mutate events?
There was a problem hiding this comment.
So I wanted the copy to be explicit as the clients may have to buffer the events or something, the struct is just a bunch of shared_ptrs so copy is cheap the real risk is if some event comes which deletes the entry from the map which has the original shared_ptr.
As for the const ig there is better value in making it shared_ptr<const> I will look into that part.
| absl::string_view name = ReverseTunnelEvent::getName(connection->node_id); | ||
| auto [it, inserted] = connections_.try_emplace(std::move(name), std::move(connection), 1); |
There was a problem hiding this comment.
Using absl::string as the key in connection is fragile, let's use an owned std::string here.
9b327d8 to
a4a1eaa
Compare
Signed-off-by: aakugan <aakashganapathy2@gmail.com>
a4a1eaa to
af15b08
Compare
BUILDS ON #44327, depends on the protos introduced there.
Incremental DIFFS
Commit Message
Add EventReporter for reverse tunnel connection reporting
Additional Description:
Adds the EventReporter contrib extension, which aggregates reverse tunnel connection/disconnection events and fans them out to pluggable clients.
Key components:
ReverseTunnelReporterWithState/ReverseTunnelReporterClient: interfaces allowing multiple transport clients to share connection state from a single reporter. Clients receive incremental diffs and can request a full state snapshot so they dont have to be "stateful" other than the protocol they manage.EventReporter: A basic implementation which does refcounting of connections by the name and stores in a map for interfacing with the clients.Risk Level: Low, opt in extension.
Testing
Unit tests.