Skip to content

[gRPC](1/N) gRPC-in-gRPC-out support for Generate non-streaming#2211

Open
zetxqx wants to merge 2 commits intokubernetes-sigs:mainfrom
zetxqx:generategrpc
Open

[gRPC](1/N) gRPC-in-gRPC-out support for Generate non-streaming#2211
zetxqx wants to merge 2 commits intokubernetes-sigs:mainfrom
zetxqx:generategrpc

Conversation

@zetxqx
Copy link
Contributor

@zetxqx zetxqx commented Jan 26, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:
Initial support for gRPC-in-gRPC-out non streaming Generate API.

Key changes:

  1. copy the vllm proto definition and generate the client code to pkg/epp/api folder
  2. Modify the epp/handlers/server.go flow for gRPC-in-gRPC-out, key difference are:
    • when receiving the gRPC request body, parse it using the new codec pkg. and populate the reqCtx.SchedulingRequestBody
    • determine the response endOfStream using trailer instead of depending on the EndOFstream

I tested the whole flow is working by using custom GKE controller.
The following is an successful result using test go code here: https://github.com/zetxqx/gateway-api-inference-extension/blob/c311c6f3097ee3337ccbe39714d6f649a1c3cba0/pkg/epp/grpc/examples/client/main.go

❯ go run ./pkg/epp/grpc/examples/client --target-address "<ip>:443"

--- Generate (Non-Streaming) ---
Complete Response (IDs): [100 101 102 103 104]
Complete Response (Text): <100><101><102><103><104>
Finish Reason: stop

Which issue(s) this PR fixes:

Fixes partially # #2166

Does this PR introduce a user-facing change?:

Add initial gRPC-in-gRPC-out for non streaming Generate RPC

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 26, 2026
@netlify
Copy link

netlify bot commented Jan 26, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 2b20cc1
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69afb001b32cb30008b66911
😎 Deploy Preview https://deploy-preview-2211--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 26, 2026
@k8s-ci-robot k8s-ci-robot requested review from ahg-g and elevran January 26, 2026 20:11
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zetxqx
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 26, 2026
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 26, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 26, 2026
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 29, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 3, 2026
Copy link
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, but I think we should take a step back and define a pluggable interface for request parsing. I created #2295.

I strongly suggest we first define the interface, then add the openAI http format as the first plugin, then we add the grpc one.


option go_package = "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/api/gen";

// Service definition for vLLM engine communication
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ask the vllm project to have the go generated code in vllm repo so we just import instead of copying the whole thing. We don't copy the ext-proc protobufs for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until then, we can keep this, but I would like to have this support exposed and implemented as a plugin, and so I would expect this proto definition to be stored with the plugin pkg.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing, how do we keep the generated code up-to-date? I expected to see a makefile rule or something like that (similar to the CRD generation rule).

ContentTypeKey = "content-type"
PathKey = ":path"
JSONContentType = "application/json"
GRPCContentType = "application/grpc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are those headers standard? is there a link we can reference here?

@RyanRosario
Copy link
Contributor

/cc @RyanRosario

@k8s-ci-robot
Copy link
Contributor

@RyanRosario: GitHub didn't allow me to request PR reviews from the following users: RyanRosario.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @RyanRosario

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 25, 2026
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 5, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 10, 2026
@k8s-ci-robot
Copy link
Contributor

@zetxqx: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gateway-api-inference-extension-verify-main 2b20cc1 link true /test pull-gateway-api-inference-extension-verify-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants