Manifests for deploying Docker Model Runner on Kubernetes with ephemeral storage, GPU support, and model pre-pulling capabilities.
kubectl apply -f static/docker-model-runner-desktop.yaml
kubectl wait --for=condition=Available deployment/docker-model-runner --timeout=5m
MODEL_RUNNER_HOST=http://localhost:31245 docker model run ai/smollm2:latest
kubectl apply -f static/docker-model-runner.yaml
kubectl wait --for=condition=Available deployment/docker-model-runner --timeout=5m
kubectl port-forward deployment/docker-model-runner 31245:12434
Then:
MODEL_RUNNER_HOST=http://localhost:31245 docker model run ai/smollm2:latest
Key configuration options in values.yaml:
# Storage configuration
storage:
size: 100Gi
storageClass: "" # Set this to the storage class of your cloud provider.
# Model pre-pull configuration
modelInit:
enabled: false
models:
- "ai/smollm2:latest"
# Image configuration
image:
repository: docker/model-runner
tag: "latest" # Use 'latest-cuda' for NVIDIA or 'latest-rocm' for AMD GPUs
pullPolicy: IfNotPresent
# GPU configuration
gpu:
enabled: false
vendor: nvidia # or amd
count: 1
# For AMD GPUs, use 'latest-rocm' image tag
# NodePort configuration
nodePort:
enabled: false
port: 31245To enable GPU scheduling:
gpu:
enabled: true
vendor: nvidia # or amd
count: 1This will add the appropriate resource requests/limits:
- NVIDIA:
nvidia.com/gpu - AMD:
amd.com/gpu
Configure models to pre-pull during pod initialization:
modelInit:
enabled: true
models:
- "ai/smollm2:latest"
- "ai/llama3.2:latest"
- "ai/mistral:latest"Once installed, set up a port-forward to access the service:
kubectl port-forward service/docker-model-runner-nodeport 31245:80Then test the model runner:
MODEL_RUNNER_HOST=http://localhost:31245 docker model run ai/smollm2:latestTo use Docker Model Runner with Open WebUI, install the Open WebUI Helm chart:
# Add the Open WebUI Helm repository
helm repo add open-webui https://helm.openwebui.com/
helm repo update
# Install Open WebUI with auth diabled
# See the open-webui Helm chart for
# connecting to your auth provider.
helm upgrade --install --wait open-webui open-webui/open-webui \
--set ollama.enabled=false \
--set pipelines.enabled=false \
--set extraEnvVars[0].name="WEBUI_AUTH" \
--set-string extraEnvVars[0].value=false \
--set openaiBaseApiUrl="http://docker-model-runner/engines/v1"Access Open WebUI:
kubectl port-forward service/open-webui 8080:80Then visit http://localhost:8080 in your browser.
| Parameter | Description | Default |
|---|---|---|
replicaCount |
Number of replicas | 1 |
image.repository |
Docker Model Runner image repository | docker/model-runner |
image.tag |
Docker Model Runner image tag | latest |
image.pullPolicy |
Image pull policy | IfNotPresent |
storage.size |
Ephemeral volume size | 100Gi |
storage.storageClass |
Storage class for ephemeral volume | "" |
modelInit.enabled |
Enable model pre-pulling | false |
modelInit.models |
List of models to pre-pull | ["ai/smollm2:latest"] |
gpu.enabled |
Enable GPU support | false |
gpu.vendor |
GPU vendor (nvidia or amd) | nvidia |
gpu.count |
Number of GPUs to request | 1 |
nodePort.enabled |
Enable NodePort service | false |
nodePort.port |
NodePort port number | 31245 |
Check the pod logs:
kubectl logs -f deployment/docker-model-runnerCheck the init container logs:
kubectl logs -f deployment/docker-model-runner -c model-initYour cluster must use a GPU scheduling plugin.
Ensure your cluster has GPU support and the appropriate device plugin installed:
- For NVIDIA GPUs: Install the NVIDIA device plugin
- For AMD GPUs: Install the AMD device plugin