Skip to content

Feature Request: Add language parameter to TTS API for cross-lingual CJK voice cloning #1263

@nariakiiwatani

Description

@nariakiiwatani

Problem

When using a voice model cloned from Japanese audio to synthesize Chinese or Korean text via the /v1/tts endpoint (S2-Pro), shared CJK characters (漢字) are incorrectly pronounced in Japanese instead of the target language.

Examples

  • Chinese text 十二月十五日 is read as Japanese "じゅうにがつじゅうごにち" instead of Chinese "shí'èr yuè shíwǔ rì"
  • Korean text with 2년 has 2 read as Japanese "に" instead of Korean "이"

This happens because the automatic language detection appears to be biased by the reference voice's language. Since Japanese, Chinese, and Korean share many characters (漢字/汉字/한자), the TTS engine cannot reliably determine the intended language from text alone when the reference voice is in a different CJK language.

Workarounds Attempted

  • Phoneme annotations for every character: fixes pronunciation but makes speech very unnatural
  • Partial phoneme annotations (first few characters only): annotated parts sound unnatural, and the rest still reverts to Japanese pronunciation
  • Rewriting text (e.g., spelling out numbers in target language): does not help for shared characters like 月, 日, 年

Proposed Solution

Add an optional language parameter to the TTS request body:

{
  "text": "十二月十五日",
  "reference_id": "model_id",
  "language": "zh"
}

This would allow the TTS engine to explicitly determine the pronunciation language for the given text, regardless of the reference voice's original language.

Use Case

We are building a bilingual learning audio service that:

  1. Transcribes audio (e.g., Japanese podcast) with speaker diarization
  2. Clones each speaker's voice using Fish Audio
  3. Translates segments to a target language (e.g., Chinese, Korean)
  4. Synthesizes the translated text using the cloned voice

This cross-lingual voice cloning workflow works well for non-CJK target languages (e.g., Japanese to English), but breaks down for CJK to CJK translation due to shared characters.

Environment

  • Model: S2-Pro
  • API endpoint: /v1/tts
  • Source language: Japanese
  • Target languages affected: Chinese (zh), Korean (ko)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions