You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add per-model keep_alive configuration for idle eviction
Allow developers to control how long a model stays loaded in memory before being evicted, following Ollama API semantics. Supports duration strings (5m, 1h), 0 for immediate unload, and -1 for never.
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
cmd.Flags().Var(NewFloat64PtrValue(&f.GPUMemoryUtilization), "gpu-memory-utilization", "fraction of GPU memory to use for the model executor (0.0-1.0) - vLLM only")
148
148
cmd.Flags().Var(NewBoolPtrValue(&f.Think), "think", "enable reasoning mode for thinking models")
cmd.Flags().StringVar(&f.KeepAlive, "keep-alive", "", "duration to keep model loaded (e.g., '5m', '1h', '0' to unload immediately, '-1' to never unload)")
150
151
}
151
152
152
153
// BuildConfigureRequest builds a scheduling.ConfigureRequest from the flags.
0 commit comments