[Feature Request] Implement configurable trigger debounce for local inference

**Background**
I am running llama-vscode on a CPU-only system. While the extension is excellent, it triggers a completion request almost immediately upon typing.

**The Problem**
The extension still initiates the local LLM backend immediately. On low-end hardware, this causes:
1. Input Latency: Typing becomes laggy as the CPU spikes to handle the inference request. CPU is at 100% whenever I type (also loud).
2. Resource Waste: Multiple requests are sent for "half-finished" thoughts, which are immediately cancelled/discarded as I continue typing. 

**Proposed Solution**
- Add an extension-specific setting, e.g., llama.autocomplete.debounceDelay (default 0ms).
- When set (e.g., to 3000ms), the extension should wait for a period of "idle" time after the last keystroke before calling the backend.

This would ensure that only "intentional" pauses in typing trigger the CPU-intensive local model.

**Alternatives Considered**
- Manual Triggering: Possible, but a debounce offers a much smoother "automatic" experience for local users. No extra keystrokes (which is taken by another shortcut = another problem).

I am happy to contribute a PR for this if the maintainers agree with the approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Implement configurable trigger debounce for local inference #161

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Implement configurable trigger debounce for local inference #161

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions