[gRPC](1/N) gRPC-in-gRPC-out support for Generate non-streaming#2211
[gRPC](1/N) gRPC-in-gRPC-out support for Generate non-streaming#2211zetxqx wants to merge 2 commits intokubernetes-sigs:mainfrom
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: zetxqx The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
ahg-g
left a comment
There was a problem hiding this comment.
This is great, but I think we should take a step back and define a pluggable interface for request parsing. I created #2295.
I strongly suggest we first define the interface, then add the openAI http format as the first plugin, then we add the grpc one.
|
|
||
| option go_package = "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/api/gen"; | ||
|
|
||
| // Service definition for vLLM engine communication |
There was a problem hiding this comment.
We should ask the vllm project to have the go generated code in vllm repo so we just import instead of copying the whole thing. We don't copy the ext-proc protobufs for example.
There was a problem hiding this comment.
until then, we can keep this, but I would like to have this support exposed and implemented as a plugin, and so I would expect this proto definition to be stored with the plugin pkg.
There was a problem hiding this comment.
Another thing, how do we keep the generated code up-to-date? I expected to see a makefile rule or something like that (similar to the CRD generation rule).
pkg/epp/util/request/headers.go
Outdated
| ContentTypeKey = "content-type" | ||
| PathKey = ":path" | ||
| JSONContentType = "application/json" | ||
| GRPCContentType = "application/grpc" |
There was a problem hiding this comment.
are those headers standard? is there a link we can reference here?
|
/cc @RyanRosario |
|
@RyanRosario: GitHub didn't allow me to request PR reviews from the following users: RyanRosario. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@zetxqx: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Initial support for gRPC-in-gRPC-out non streaming Generate API.
Key changes:
pkg/epp/apifolderepp/handlers/server.goflow for gRPC-in-gRPC-out, key difference are:reqCtx.SchedulingRequestBodyI tested the whole flow is working by using custom GKE controller.
The following is an successful result using test go code here: https://github.com/zetxqx/gateway-api-inference-extension/blob/c311c6f3097ee3337ccbe39714d6f649a1c3cba0/pkg/epp/grpc/examples/client/main.go
Which issue(s) this PR fixes:
Fixes partially # #2166
Does this PR introduce a user-facing change?: