-
Notifications
You must be signed in to change notification settings - Fork 106
Open
Description
Background
I am running llama-vscode on a CPU-only system. While the extension is excellent, it triggers a completion request almost immediately upon typing.
The Problem
The extension still initiates the local LLM backend immediately. On low-end hardware, this causes:
- Input Latency: Typing becomes laggy as the CPU spikes to handle the inference request. CPU is at 100% whenever I type (also loud).
- Resource Waste: Multiple requests are sent for "half-finished" thoughts, which are immediately cancelled/discarded as I continue typing.
Proposed Solution
- Add an extension-specific setting, e.g., llama.autocomplete.debounceDelay (default 0ms).
- When set (e.g., to 3000ms), the extension should wait for a period of "idle" time after the last keystroke before calling the backend.
This would ensure that only "intentional" pauses in typing trigger the CPU-intensive local model.
Alternatives Considered
- Manual Triggering: Possible, but a debounce offers a much smoother "automatic" experience for local users. No extra keystrokes (which is taken by another shortcut = another problem).
I am happy to contribute a PR for this if the maintainers agree with the approach.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels