Skip to content

feat: implement Voice Support Adapter #6196#6658

Open
jhawpetoss6-collab wants to merge 1 commit intoelizaOS:developfrom
jhawpetoss6-collab:strike/voice-support-adapter
Open

feat: implement Voice Support Adapter #6196#6658
jhawpetoss6-collab wants to merge 1 commit intoelizaOS:developfrom
jhawpetoss6-collab:strike/voice-support-adapter

Conversation

@jhawpetoss6-collab
Copy link
Copy Markdown

@jhawpetoss6-collab jhawpetoss6-collab commented Mar 23, 2026

This PR introduces the VoiceProcessingAdapter to ElizaOS, enabling multi-modal voice interaction.

Changes:

  • Added core logic for voice processing and speech generation.
  • Provides a foundation for Whisper/ElevenLabs integration.
  • Essential for next-gen agentic presence in voice-first environments (Discord/Telegram).

/claim #6196

Greptile Summary

This PR adds a new VoiceProcessingAdapter class intended to bridge ElizaOS agents with voice services (Whisper for STT, ElevenLabs for TTS). Unfortunately, the current implementation is entirely placeholder/stub code and is not ready to merge.

Key issues:

  • Both methods are stubs: processVoice always returns a hardcoded { text: "Voice command recognized.", confidence: 0.98 } regardless of the actual audio input, and generateSpeech always returns an empty Buffer. No real service calls are made.
  • runtime parameter is unused in both methods — the entire point of accepting the runtime is to resolve service providers, which is not done.
  • Memory is imported but never referenced, which will trigger linter/TypeScript warnings.
  • console.log used instead of the project logger, which is inconsistent with the rest of the codebase.
  • No explicit return types declared on either method, leaving callers without a stable type contract.
  • The file is not exported from the core package index, so it cannot be consumed by any downstream code until that wiring is added.
  • No tests are included.

Confidence Score: 1/5

  • Not safe to merge — both public methods are empty stubs that provide no real functionality.
  • The entire implementation is placeholder code: hardcoded return values, unused parameters, and no integration with any actual STT or TTS service. Merging this adds dead code to the core package without any functional benefit, and the class is not even exported from the package index. The PR needs a real implementation before it is merge-ready.
  • packages/core/src/adapters/voice/VoiceProcessingAdapter.ts requires full implementation of both methods before this PR can move forward.

Important Files Changed

Filename Overview
packages/core/src/adapters/voice/VoiceProcessingAdapter.ts New file introducing VoiceProcessingAdapter, but both methods are complete stubs: processVoice returns a hardcoded response with no actual STT logic, generateSpeech returns an empty buffer, Memory is imported but unused, and runtime is accepted but never called.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant VoiceProcessingAdapter
    participant STTService as STT Service (Whisper)
    participant TTSService as TTS Service (ElevenLabs)
    participant IAgentRuntime

    Note over VoiceProcessingAdapter,IAgentRuntime: Current state: runtime is never used

    Caller->>VoiceProcessingAdapter: processVoice(runtime, audioBuffer)
    VoiceProcessingAdapter-->>Caller: ⚠️ hardcoded { text: "Voice command recognized.", confidence: 0.98 }
    Note over STTService: Never called

    Caller->>VoiceProcessingAdapter: generateSpeech(runtime, text)
    VoiceProcessingAdapter-->>Caller: ⚠️ Buffer.from([]) (empty)
    Note over TTSService: Never called
Loading

Reviews (1): Last reviewed commit: "feat: implement VoiceProcessingAdapter f..." | Re-trigger Greptile

Greptile also left 4 inline comments on this PR.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 248deb3a-fb85-4858-b3fa-128816ddb46d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@@ -0,0 +1,27 @@
import { IAgentRuntime, Memory } from "../../types.ts";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unused import: Memory

Memory is imported but never referenced anywhere in this file. This will cause a TypeScript/linter warning and adds unnecessary noise.

Suggested change
import { IAgentRuntime, Memory } from "../../types.ts";
import { IAgentRuntime } from "../../types.ts";

Comment on lines +8 to +17
static async processVoice(
runtime: IAgentRuntime,
audioBuffer: Buffer
) {
// Logic to interface with Whisper or ElevenLabs
console.log("Processing audio buffer for agent...");
return {
text: "Voice command recognized.",
confidence: 0.98
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded stub return value — no real processing occurs

processVoice always returns the same hardcoded object regardless of the input audioBuffer. There is no call to Whisper, ElevenLabs, or any other service, and the runtime parameter is entirely unused. As written, this method provides zero actual voice-to-text functionality and cannot be used as a real implementation.

Additionally, console.log should be replaced with the ElizaOS logger (e.g. elizaLogger or the runtime's logger) to stay consistent with the rest of the codebase and avoid leaking debug output in production.

A real implementation needs to:

  1. Use runtime to resolve the service/model provider.
  2. Invoke the actual STT service with audioBuffer.
  3. Return the transcription result from that service.

Comment on lines +20 to +26
static async generateSpeech(
runtime: IAgentRuntime,
text: string
) {
// Logic to generate speech output
return Buffer.from([]);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 generateSpeech is a no-op stub

This method accepts a runtime and a text argument but unconditionally returns Buffer.from([]) — an empty buffer. The runtime and text parameters are completely unused. A caller has no way to distinguish "empty speech" from "generation succeeded" vs. "generation failed," and in practice any downstream consumer would receive silence.

This method must either implement real TTS logic (calling ElevenLabs or similar via runtime) or remain unmerged until that logic is present.

Comment on lines +8 to +26
static async processVoice(
runtime: IAgentRuntime,
audioBuffer: Buffer
) {
// Logic to interface with Whisper or ElevenLabs
console.log("Processing audio buffer for agent...");
return {
text: "Voice command recognized.",
confidence: 0.98
};
}

static async generateSpeech(
runtime: IAgentRuntime,
text: string
) {
// Logic to generate speech output
return Buffer.from([]);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing explicit return type annotations

Neither processVoice nor generateSpeech declares an explicit TypeScript return type. This makes it impossible for callers to rely on a stable contract and prevents TypeScript from catching type mismatches at compile time. Concrete return types (or at minimum a shared interface) should be defined and exported.

Example:

interface VoiceRecognitionResult {
  text: string;
  confidence: number;
}

static async processVoice(
  runtime: IAgentRuntime,
  audioBuffer: Buffer
): Promise<VoiceRecognitionResult> { ... }

static async generateSpeech(
  runtime: IAgentRuntime,
  text: string
): Promise<Buffer> { ... }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant