Generalize the request type passed down the framework plugins: rename LLM->Inference#2673
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @RyanRosario. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
1cce76e to
f6b5ec6
Compare
|
@zetxqx For now, I've had to commit some other files into this PR to get PROW to pass. I am not sure what the issue is here, but I want to keep the ball rolling. |
|
@zetxqx I still have a few comments to address but wanted to address the rest in this PR. |
|
the latency predictor related changes look fine now, as i only see the name change. |
|
@ahg-g Please review when you get a chance. |
|
/lgtm @ahg-g Copying from the PR description, discussed with @RyanRosario , we want to split the refactoring into the following three PRs.
|
|
/retest |
a1dfc37 to
9bb89eb
Compare
|
/lgtm |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, RyanRosario The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
starting from this PR image builds are failing, I assume it's related. |
The failed build logs: https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/post-inference-extension-push-images/2042704260231073792 |
|
Created #2832 |
…on#2673) Co-authored-by: Ryan Rosario <6713180+RyanRosario@users.noreply.github.com>
What type of PR is this?
/kind feature
What this PR does / why we need it:
Enables direct application across various GenAI models, not only OpenAI format, without rewriting the core admission, mutation, or scheduling flows. Pluggable parsers can now intercept raw request bytes and construct a generic InferenceRequest upfront, giving the EPP the flexibility to route, process, and score payloads transparently regardless of the original protocol.
Which issue(s) this PR fixes:
Related to #2447
Does this PR introduce a user-facing change?:
This is a series of 3 PRs.
(1) This PR simply renames LLMRequest to InferenceRequest
(2) The second PR moves RequestBody from scheduling to requesthandling (2808)
(3) The third PR separates the parser from the directory (2810)