Skip to content

[Feature Request] Implement configurable trigger debounce for local inference #161

@h8f1z

Description

@h8f1z

Background
I am running llama-vscode on a CPU-only system. While the extension is excellent, it triggers a completion request almost immediately upon typing.

The Problem
The extension still initiates the local LLM backend immediately. On low-end hardware, this causes:

  1. Input Latency: Typing becomes laggy as the CPU spikes to handle the inference request. CPU is at 100% whenever I type (also loud).
  2. Resource Waste: Multiple requests are sent for "half-finished" thoughts, which are immediately cancelled/discarded as I continue typing.

Proposed Solution

  • Add an extension-specific setting, e.g., llama.autocomplete.debounceDelay (default 0ms).
  • When set (e.g., to 3000ms), the extension should wait for a period of "idle" time after the last keystroke before calling the backend.

This would ensure that only "intentional" pauses in typing trigger the CPU-intensive local model.

Alternatives Considered

  • Manual Triggering: Possible, but a debounce offers a much smoother "automatic" experience for local users. No extra keystrokes (which is taken by another shortcut = another problem).

I am happy to contribute a PR for this if the maintainers agree with the approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions