feat: add OpenShift single-namespace deployment for separate pods#490
feat: add OpenShift single-namespace deployment for separate pods#490szedan-rh wants to merge 5 commits intovllm-project:mainfrom
Conversation
This PR adds a complete single-namespace OpenShift deployment configuration for vLLM Semantic Router with separate pods architecture. Key Features: - Single namespace deployment (vllm-semantic-router) - Separate pods: router+envoy, model-a, model-b - External HTTPS access via OpenShift Route - Automated deployment script - Complete documentation Components: - BuildConfigs for router and llm-katan model - ConfigMaps for Envoy and Router configuration - Deployments for all components - PersistentVolumeClaims for model storage - Services for inter-pod communication - Route for external HTTPS access with TLS termination - Automated deployment script Tested and verified working deployment with: - Successful model routing (math → Model-A, business → Model-B) - External HTTPS access via Route - All pods running and healthy - Classification and routing working correctly Files Added: - deploy/openshift/single-namespace/ (14 files) - .dockerignore - Updated .github/workflows/docker-publish.yml Signed-off-by: szedan <senan.zedan@gmail.com> Signed-off-by: szedan <szedan@redhat.com>
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
|
/assign @yossiovadia |
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a comprehensive single-namespace OpenShift deployment configuration for vLLM Semantic Router with a separate pods architecture. The deployment uses three pods within one namespace: a router+envoy pod for classification and routing, and two separate model pods (model-a and model-b) running on GPUs. The configuration includes automated build processes, persistent storage, HTTPS external access, and complete documentation.
Key changes:
- Complete OpenShift deployment manifests for single-namespace architecture with separate pods
- Automated deployment script with build orchestration and monitoring
- Comprehensive documentation with troubleshooting guides and testing instructions
Reviewed Changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| deploy/openshift/single-namespace/services.yaml | Defines ClusterIP services for envoy-proxy, semantic-router, model-a, and model-b |
| deploy/openshift/single-namespace/scripts/deploy-all.sh | Automated deployment script handling namespace creation, builds, and monitoring |
| deploy/openshift/single-namespace/route.yaml | OpenShift Route for external HTTPS access with TLS edge termination |
| deploy/openshift/single-namespace/pvcs.yaml | Persistent volume claims for router models/cache and model caches |
| deploy/openshift/single-namespace/namespace.yaml | Namespace definition for vllm-semantic-router |
| deploy/openshift/single-namespace/imagestreams.yaml | ImageStreams for semantic-router and llm-katan builds |
| deploy/openshift/single-namespace/deployment-router.yaml | Deployment for router pod with init container for model downloads |
| deploy/openshift/single-namespace/deployment-model-b.yaml | Deployment for model-b pod with GPU requirements |
| deploy/openshift/single-namespace/deployment-model-a.yaml | Deployment for model-a pod with GPU requirements |
| deploy/openshift/single-namespace/configmap-router.yaml | Router configuration with hardcoded ClusterIP addresses |
| deploy/openshift/single-namespace/configmap-envoy.yaml | Envoy proxy configuration for routing and external processing |
| deploy/openshift/single-namespace/buildconfig-router.yaml | BuildConfig for semantic-router image from binary source |
| deploy/openshift/single-namespace/buildconfig-llm-katan.yaml | BuildConfig for llm-katan image from GitHub repository |
| deploy/openshift/single-namespace/README.md | Comprehensive deployment guide with troubleshooting and testing instructions |
| .dockerignore | Docker build exclusions for models, cache, and build artifacts |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
@szedan-rh thanks for contribution, can you run |
|
@rootfs - Indeed. |
…ft deployment This commit enhances the OpenShift single-namespace deployment with two major improvements: 1. Automatic IP Configuration: - Modified deploy-all.sh to automatically detect and configure ClusterIP addresses - Services are created first, then ClusterIPs are retrieved dynamically - Router ConfigMap is automatically updated with correct model service IPs - Deployment now works on any OpenShift cluster without manual IP configuration - Tested and verified with different IP allocations 2. Comprehensive Uninstall Script: - Added uninstall.sh for complete or partial removal of deployment - Interactive confirmations for destructive operations (PVCs, namespace) - Proper deletion order: Route → Deployments → Services → ConfigMaps → Builds → ImageStreams → PVCs → Namespace - Option to preserve data (PVCs) or namespace - Cleanup of temporary files - Tested successfully with full uninstall 3. Build Configuration Fix: - Fixed buildconfig-llm-katan.yaml to use inline Dockerfile - Changed from Git source (private repo) to official vLLM package - Uses vllm==0.6.3.post1 with PyTorch 2.1.0 and CUDA 12.1 support - OpenShift compatible (non-root user 1001) 4. Documentation Updates: - Moved deployment guide from README.md to DEPLOYMENT_GUIDE.md - Added comprehensive automatic IP configuration documentation - Added uninstall documentation with examples - Documented manual and automated deployment options Testing Results: - ✅ Deployment tested with automatic IP detection (IPs: 172.30.128.210, 172.30.17.224) - ✅ Uninstall script tested with complete removal - ✅ All pre-commit checks passing (11/11) - ✅ Build configurations verified working This makes the deployment production-ready and portable across different OpenShift clusters. Signed-off-by: szedan <szedan@redhat.com>
…ion tools This commit improves the OpenShift single-namespace deployment with several key enhancements for better performance, usability, and maintainability. Changes: - Switch from heavy vLLM image to lightweight llm-katan image (~10GB → ~2-3GB) - Remove semantic-router build (now uses pre-built ghcr.io image) - Add GPU node status check and automated restart guidance - Consolidate uninstall script confirmation prompts (3 → 1) - Add route creation to deployment script for immediate external access New validation tools: - check-gpu-status.sh: Automated GPU node availability checking - validate-deployment.sh: Comprehensive endpoint and flow validation - test-queries-with-trace.sh: Detailed query routing visualization Documentation: - ADD-MODEL-C-GUIDE.md: Complete guide for adding additional models These changes reduce deployment time from 20-30 minutes to 5-10 minutes and provide comprehensive testing and troubleshooting capabilities. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: szedan <szedan@redhat.com>
775429f to
3cf639b
Compare
|
Will create one script that do the same things for openshift based on Yossi O. script. |

Summary
This PR adds a complete single-namespace OpenShift deployment configuration for vLLM Semantic Router with a separate pods architecture.
Key Features
vllm-semantic-router)Components Added
Testing
Tested and verified working deployment with:
Files Changed
deploy/openshift/single-namespace/.dockerignore.github/workflows/docker-publish.ymlDocumentation
Complete deployment guide included in
deploy/openshift/single-namespace/README.mdSigned-off-by: szedan szedan@redhat.com