Skip to content

Commit 5ac840d

Browse files
committed
add dmrlet - lightweight node agent for Docker Model Runner
dmrlet is a lightweight node agent that runs inference containers directly with zero YAML overhead. It provides a simple CLI to serve models: dmrlet serve ai/smollm2 # Pulls model, starts inference container, exposes OpenAI API Key features: - Reuses existing pkg/distribution for model management - containerd integration for container lifecycle - GPU detection and passthrough (NVIDIA/AMD) - Auto port allocation (30000-30999 range) - Health checking with configurable timeout - Backend auto-detection (llama-server for GGUF, vLLM for safetensors) Commands: serve, stop, list, pull, version Signed-off-by: Eric Curtin <eric.curtin@docker.com>
1 parent cd24a2a commit 5ac840d

File tree

36 files changed

+7913
-275
lines changed

36 files changed

+7913
-275
lines changed

Makefile

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,24 @@ DOCKER_BUILD_ARGS := \
2626
BUILD_DMR ?= 1
2727

2828
# Main targets
29-
.PHONY: build run clean test integration-tests test-docker-ce-installation docker-build docker-build-multiplatform docker-run docker-build-vllm docker-run-vllm docker-build-sglang docker-run-sglang docker-run-impl help validate lint docker-build-diffusers docker-run-diffusers vllm-metal-build vllm-metal-install vllm-metal-dev vllm-metal-clean
29+
.PHONY: build build-dmrlet run clean test integration-tests test-docker-ce-installation docker-build docker-build-multiplatform docker-run docker-build-vllm docker-run-vllm docker-build-sglang docker-run-sglang docker-run-impl help validate lint docker-build-diffusers docker-run-diffusers vllm-metal-build vllm-metal-install vllm-metal-dev vllm-metal-clean
3030
# Default target
3131
.DEFAULT_GOAL := build
3232

3333
# Build the Go application
3434
build:
3535
CGO_ENABLED=1 go build -ldflags="-s -w" -o $(APP_NAME) .
3636

37+
# Build dmrlet binary
38+
build-dmrlet:
39+
@echo "Building dmrlet..."
40+
@VERSION=$$(git describe --tags --always --dirty 2>/dev/null || echo "dev"); \
41+
GIT_COMMIT=$$(git rev-parse HEAD 2>/dev/null || echo "unknown"); \
42+
BUILD_DATE=$$(date -u +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || echo "unknown"); \
43+
cd cmd/dmrlet && CGO_ENABLED=0 go build -ldflags="-s -w -X 'main.Version=$${VERSION}' -X 'main.GitCommit=$${GIT_COMMIT}' -X 'main.BuildDate=$${BUILD_DATE}'" -o dmrlet .
44+
mv cmd/dmrlet/dmrlet .
45+
@echo "Built: dmrlet"
46+
3747
# Run the application locally
3848
run: build
3949
@LLAMACPP_BIN="llamacpp/install/bin"; \
@@ -46,6 +56,7 @@ run: build
4656
# Clean build artifacts
4757
clean:
4858
rm -f $(APP_NAME)
59+
rm -f dmrlet
4960
rm -f model-runner.sock
5061
rm -rf $(MODELS_PATH)
5162

@@ -219,6 +230,7 @@ vllm-metal-clean:
219230
help:
220231
@echo "Available targets:"
221232
@echo " build - Build the Go application"
233+
@echo " build-dmrlet - Build dmrlet binary (lightweight node agent)"
222234
@echo " run - Run the application locally"
223235
@echo " clean - Clean build artifacts"
224236
@echo " test - Run tests"

README.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -415,6 +415,115 @@ in the form of [a Helm chart and static YAML](charts/docker-model-runner/README.
415415
If you are interested in a specific Kubernetes use-case, please start a
416416
discussion on the issue tracker.
417417

418+
## dmrlet: Container Orchestrator for AI Inference
419+
420+
dmrlet is a purpose-built container orchestrator for AI inference workloads. Unlike Kubernetes, it focuses exclusively on running stateless inference containers with zero configuration overhead. Multi-GPU mapping "just works" without YAML, device plugins, or node selectors.
421+
422+
### Key Features
423+
424+
| Feature | Kubernetes | dmrlet |
425+
|---------|------------|--------|
426+
| Multi-GPU setup | Device plugins + node selectors + resource limits YAML | `dmrlet serve llama3 --gpus all` |
427+
| Config overhead | 50+ lines of YAML minimum | Zero YAML, CLI-only |
428+
| Time to first inference | Minutes (pod scheduling, image pull) | Seconds (model already local) |
429+
| Model management | External (mount PVCs, manage yourself) | Integrated with Docker Model Runner store |
430+
431+
### Building dmrlet
432+
433+
```bash
434+
# Build the dmrlet binary
435+
go build -o dmrlet ./cmd/dmrlet
436+
437+
# Verify it works
438+
./dmrlet --help
439+
```
440+
441+
### Usage
442+
443+
**Start the daemon:**
444+
```bash
445+
# Start in foreground
446+
dmrlet daemon
447+
448+
# With custom socket path
449+
dmrlet daemon --socket /tmp/dmrlet.sock
450+
```
451+
452+
**Serve a model:**
453+
```bash
454+
# Auto-detect backend and GPUs
455+
dmrlet serve llama3.2
456+
457+
# Specify backend
458+
dmrlet serve llama3.2 --backend vllm
459+
460+
# Specify GPU allocation
461+
dmrlet serve llama3.2 --gpus 0,1
462+
dmrlet serve llama3.2 --gpus all
463+
464+
# Multiple replicas
465+
dmrlet serve llama3.2 --replicas 2
466+
467+
# Backend-specific options
468+
dmrlet serve llama3.2 --ctx-size 4096 # llama.cpp context size
469+
dmrlet serve llama3.2 --gpu-memory 0.8 # vLLM GPU memory utilization
470+
```
471+
472+
**List running models:**
473+
```bash
474+
dmrlet ps
475+
# MODEL BACKEND REPLICAS GPUS ENDPOINTS STATUS
476+
# llama3.2 llama.cpp 1 [0,1,2,3] localhost:30000 healthy
477+
```
478+
479+
**View logs:**
480+
```bash
481+
dmrlet logs llama3.2 # Last 100 lines
482+
dmrlet logs llama3.2 -f # Follow logs
483+
```
484+
485+
**Scale replicas:**
486+
```bash
487+
dmrlet scale llama3.2 4 # Scale to 4 replicas
488+
```
489+
490+
**Stop a model:**
491+
```bash
492+
dmrlet stop llama3.2
493+
dmrlet stop --all # Stop all models
494+
```
495+
496+
**Check status:**
497+
```bash
498+
dmrlet status
499+
# DAEMON: running
500+
# SOCKET: /var/run/dmrlet.sock
501+
#
502+
# GPUs:
503+
# GPU 0: NVIDIA A100 80GB 81920MB (in use: llama3.2)
504+
# GPU 1: NVIDIA A100 80GB 81920MB (available)
505+
#
506+
# MODELS: 1 running
507+
```
508+
509+
### Supported Backends
510+
511+
- **llama.cpp** - Default backend for GGUF models
512+
- **vLLM** - High-throughput serving for safetensors models
513+
- **SGLang** - Fast serving with RadixAttention
514+
515+
### Architecture
516+
517+
```
518+
dmrlet daemon
519+
├── GPU Manager - Auto-detect and allocate GPUs
520+
├── Container Manager - Docker-based container lifecycle
521+
├── Service Registry - Endpoint discovery with load balancing
522+
├── Health Monitor - Auto-restart unhealthy containers
523+
├── Auto-scaler - Scale based on QPS/latency/GPU utilization
524+
└── Log Aggregator - Centralized log collection
525+
```
526+
418527
## Community
419528

420529
For general questions and discussion, please use [Docker Model Runner's Slack channel](https://dockercommunity.slack.com/archives/C09H9P5E57B).

cmd/dmrlet/commands/list.go

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
package commands
2+
3+
import (
4+
"fmt"
5+
"os"
6+
7+
"github.com/olekukonko/tablewriter"
8+
"github.com/olekukonko/tablewriter/renderer"
9+
"github.com/olekukonko/tablewriter/tw"
10+
"github.com/spf13/cobra"
11+
)
12+
13+
func newListCmd() *cobra.Command {
14+
cmd := &cobra.Command{
15+
Use: "list",
16+
Aliases: []string{"ls"},
17+
Short: "List running models",
18+
Long: `List all running inference models managed by dmrlet.
19+
20+
Examples:
21+
dmrlet list
22+
dmrlet ls`,
23+
Args: cobra.NoArgs,
24+
RunE: func(cmd *cobra.Command, args []string) error {
25+
return runList(cmd)
26+
},
27+
}
28+
29+
return cmd
30+
}
31+
32+
func runList(cmd *cobra.Command) error {
33+
ctx := cmd.Context()
34+
35+
if err := initManager(ctx); err != nil {
36+
return fmt.Errorf("initializing manager: %w", err)
37+
}
38+
39+
running, err := manager.List(ctx)
40+
if err != nil {
41+
return fmt.Errorf("listing models: %w", err)
42+
}
43+
44+
if len(running) == 0 {
45+
cmd.Println("No running models")
46+
return nil
47+
}
48+
49+
table := tablewriter.NewTable(os.Stdout,
50+
tablewriter.WithRenderer(renderer.NewBlueprint(tw.Rendition{
51+
Borders: tw.BorderNone,
52+
Settings: tw.Settings{
53+
Separators: tw.Separators{
54+
BetweenColumns: tw.Off,
55+
},
56+
Lines: tw.Lines{
57+
ShowHeaderLine: tw.Off,
58+
},
59+
},
60+
})),
61+
tablewriter.WithConfig(tablewriter.Config{
62+
Header: tw.CellConfig{
63+
Formatting: tw.CellFormatting{
64+
AutoFormat: tw.Off,
65+
},
66+
Alignment: tw.CellAlignment{Global: tw.AlignLeft},
67+
Padding: tw.CellPadding{Global: tw.Padding{Left: "", Right: " "}},
68+
},
69+
Row: tw.CellConfig{
70+
Alignment: tw.CellAlignment{Global: tw.AlignLeft},
71+
Padding: tw.CellPadding{Global: tw.Padding{Left: "", Right: " "}},
72+
},
73+
}),
74+
)
75+
table.Header([]string{"MODEL", "BACKEND", "PORT", "ENDPOINT"})
76+
77+
for _, m := range running {
78+
table.Append([]string{
79+
m.ModelRef,
80+
string(m.Backend),
81+
fmt.Sprintf("%d", m.Port),
82+
m.Endpoint,
83+
})
84+
}
85+
86+
table.Render()
87+
return nil
88+
}

cmd/dmrlet/commands/pull.go

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
package commands
2+
3+
import (
4+
"fmt"
5+
"os"
6+
7+
"github.com/spf13/cobra"
8+
)
9+
10+
func newPullCmd() *cobra.Command {
11+
cmd := &cobra.Command{
12+
Use: "pull MODEL",
13+
Short: "Pull a model without serving",
14+
Long: `Pull a model from Docker Hub or HuggingFace without starting an inference container.
15+
This is useful for pre-downloading models.
16+
17+
Examples:
18+
dmrlet pull ai/smollm2
19+
dmrlet pull huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf`,
20+
Args: cobra.ExactArgs(1),
21+
RunE: func(cmd *cobra.Command, args []string) error {
22+
return runPull(cmd, args[0])
23+
},
24+
}
25+
26+
return cmd
27+
}
28+
29+
func runPull(cmd *cobra.Command, modelRef string) error {
30+
ctx := cmd.Context()
31+
32+
if err := initStore(); err != nil {
33+
return fmt.Errorf("initializing store: %w", err)
34+
}
35+
36+
cmd.Printf("Pulling model: %s\n", modelRef)
37+
38+
if err := store.EnsureModel(ctx, modelRef, os.Stdout); err != nil {
39+
return fmt.Errorf("pulling model: %w", err)
40+
}
41+
42+
cmd.Printf("\nModel pulled successfully: %s\n", modelRef)
43+
return nil
44+
}

0 commit comments

Comments
 (0)