Skip to content

feat(flowcontrol): Add ImmediateResponse abort mechanism for evicting in-flight requests#2737

Merged
k8s-ci-robot merged 5 commits intokubernetes-sigs:mainfrom
RishabhSaini:evictImmediate
Apr 13, 2026
Merged

feat(flowcontrol): Add ImmediateResponse abort mechanism for evicting in-flight requests#2737
k8s-ci-robot merged 5 commits intokubernetes-sigs:mainfrom
RishabhSaini:evictImmediate

Conversation

@RishabhSaini
Copy link
Copy Markdown
Contributor

@RishabhSaini RishabhSaini commented Mar 30, 2026

resolves "How to evict?" part of #2632

Integrate ext_proc ImmediateResponse as the abort mechanism for evicting dispatched requests from vLLM. When eviction triggers, closing the request's abort channel causes the Process() loop to send ImmediateResponse(503), making Envoy reset the upstream connection. vLLM detects the disconnect and frees KV blocks.

  • AbortRegistry bridges eviction plugin and ext_proc Process() loop
  • ImmediateResponseAborter closes abort channels via sync.Once
  • recvOrAbort wraps srv.Recv() to select on abort signals
  • Tests cover aborter, registry, recvOrAbort, plugin integration, and concurrent eviction+completion races

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 30, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @RishabhSaini. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 30, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 30, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit a9cbf1c
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69dd45d1308fa4000868ada3
😎 Deploy Preview https://deploy-preview-2737--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 30, 2026
Comment thread pkg/epp/handlers/server.go Outdated
Comment thread pkg/epp/flowcontrol/eviction/plugin.go Outdated
Copy link
Copy Markdown
Contributor

@nirrozenbaum nirrozenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend being consistent with names.
eviction is consistent with evictor. aborter is yet another naming option but is not consistent with the existing code.

@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Apr 1, 2026

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 1, 2026
@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Apr 2, 2026

/assign @LukeAVanDrie

@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Apr 2, 2026

/test all

Comment thread pkg/epp/handlers/server.go Outdated
Comment thread pkg/epp/handlers/server.go Outdated
Comment thread pkg/epp/handlers/server.go Outdated
Comment thread pkg/epp/handlers/server.go Outdated
abortCh chan struct{},
) (*extProcPb.ProcessingRequest, error, error) {
recvCh := make(chan recvResult, 1)
go func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spawns a new goroutine to call srv.Recv() every time it is invoked.

If the abortCh triggers first, this method returns errEvicted, but the spawned goroutine remains blocked on srv.Recv() until the stream is eventually closed or data arrives. While this leak is technically bounded by the lifetime of the gRPC stream (which closes shortly after we send the ImmediateResponse and return nil), it is still a sub-optimal pattern to spawn goroutines in a loop for every message received.

Consider starting a single reader goroutine for the lifetime of the stream that reads from srv.Recv() and pumps messages into a channel. You can then use Go's property where selecting on a nil channel blocks forever to dynamically enable listening to the abortCh only after it is populated (post-scheduling).

For example:

// At the top of Process()
recvCh := make(chan recvResult, 1)
go func() {
    for {
        req, err := srv.Recv()
        recvCh <- recvResult{req: req, err: err}
        if err != nil { return }
    }
}()

var abortCh chan struct{} // Initially nil

for {
    select {
    case result := <-recvCh:
        // handle message...
        // When scheduled: abortCh = s.abortLookup.Get(abortRequestID)
        
    case <-abortCh: // Blocks forever while nil
        // handle eviction
        return nil
    }
}

Copy link
Copy Markdown
Contributor Author

@RishabhSaini RishabhSaini Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, attempted to fix this using claude

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks! Added one suggestion, then this PR lgtm

Comment thread pkg/epp/flowcontrol/eviction/evictor.go
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 10, 2026
@RishabhSaini RishabhSaini force-pushed the evictImmediate branch 4 times, most recently from d2f7fb2 to e975f58 Compare April 10, 2026 01:41
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 10, 2026
type Plugin struct {
queue *EvictionQueue
aborter Aborter
var _ requestcontrol.PreRequest = &RequestEvictor{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a misuse of the pluggable framework. This logic represents the runtime that executes the eviction logic and it should not be modeled as a plugin. We need to put this guidance in a framework's dev guide: Plugins represent entities that a user can enable/disable via the config api, and they only reside under framework/epp/plugins.

In this case, there should be an evictor subsystem instance that gets passed to request control, for example PreRequest becomes evictor.TrackRequests.

We can do this clean up as a followup.

@kfswain @LukeAVanDrie

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 13, 2026
  Integrate ext_proc ImmediateResponse as the abort mechanism for
  evicting dispatched requests from vLLM. When eviction triggers,
  closing the request's abort channel causes the Process() loop to
  send ImmediateResponse(503), making Envoy reset the upstream
  connection. vLLM detects the disconnect and frees KV blocks.

  - AbortRegistry bridges eviction plugin and ext_proc Process() loop
  - ImmediateResponseAborter closes abort channels via sync.Once
  - recvOrAbort wraps srv.Recv() to select on abort signals
  - Tests cover aborter, registry, recvOrAbort, plugin integration,
    and concurrent eviction+completion races

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
for naming consistency

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 13, 2026
Comment thread pkg/epp/handlers/server.go
…ion, evictor cleanup

- Replace per-message recvOrAbort goroutine with single reader goroutine
  for the stream lifetime, using nil channel select pattern
- Remove errEvicted sentinel; eviction is a state transition (RequestEvicted)
  handled by updateStateAndSendIfNeeded, not an error
- Move ImmediateResponse send from inline code into the state machine
- Add EvictorWithCleanup interface and cleanupRequest helper to prevent
  ImmediateResponseEvictor.closeOnce map from growing unbounded

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
@LukeAVanDrie
Copy link
Copy Markdown
Contributor

/approve

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 13, 2026
Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 13, 2026
@LukeAVanDrie
Copy link
Copy Markdown
Contributor

/lgtm

/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 13, 2026
@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Apr 13, 2026

/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LukeAVanDrie, RishabhSaini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 13, 2026
@k8s-ci-robot k8s-ci-robot merged commit bd8b1c3 into kubernetes-sigs:main Apr 13, 2026
11 checks passed
elevran pushed a commit to llm-d/llm-d-inference-scheduler that referenced this pull request Apr 23, 2026
… in-flight requests (kubernetes-sigs/gateway-api-inference-extension#2737)

* Add ImmediateResponse abort mechanism for evicting in-flight requests

  Integrate ext_proc ImmediateResponse as the abort mechanism for
  evicting dispatched requests from vLLM. When eviction triggers,
  closing the request's abort channel causes the Process() loop to
  send ImmediateResponse(503), making Envoy reset the upstream
  connection. vLLM detects the disconnect and frees KV blocks.

  - AbortRegistry bridges eviction plugin and ext_proc Process() loop
  - ImmediateResponseAborter closes abort channels via sync.Once
  - recvOrAbort wraps srv.Recv() to select on abort signals
  - Tests cover aborter, registry, recvOrAbort, plugin integration,
    and concurrent eviction+completion races

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

* Rename Aborter to Evictor, Plugin to RequestEvictor
for naming consistency

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

* Use 429 (TooManyRequests) instead of 503 for eviction ImmediateResponse

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

* Refactor Process() loop: single reader goroutine, state machine eviction, evictor cleanup

- Replace per-message recvOrAbort goroutine with single reader goroutine
  for the stream lifetime, using nil channel select pattern
- Remove errEvicted sentinel; eviction is a state transition (RequestEvicted)
  handled by updateStateAndSendIfNeeded, not an error
- Move ImmediateResponse send from inline code into the state machine
- Add EvictorWithCleanup interface and cleanupRequest helper to prevent
  ImmediateResponseEvictor.closeOnce map from growing unbounded

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

* fix: fix data read write race to ctx by capturing it

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

---------

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants