Feature Request: Add language parameter to TTS API for cross-lingual CJK voice cloning

## Problem

When using a voice model cloned from Japanese audio to synthesize Chinese or Korean text via the `/v1/tts` endpoint (S2-Pro), shared CJK characters (漢字) are incorrectly pronounced in Japanese instead of the target language.

### Examples

- **Chinese text** `十二月十五日` is read as Japanese "じゅうにがつじゅうごにち" instead of Chinese "shí'èr yuè shíwǔ rì"
- **Korean text** with `2년` has `2` read as Japanese "に" instead of Korean "이"

This happens because the automatic language detection appears to be biased by the reference voice's language. Since Japanese, Chinese, and Korean share many characters (漢字/汉字/한자), the TTS engine cannot reliably determine the intended language from text alone when the reference voice is in a different CJK language.

## Workarounds Attempted

- **Phoneme annotations** for every character: fixes pronunciation but makes speech very unnatural
- **Partial phoneme annotations** (first few characters only): annotated parts sound unnatural, and the rest still reverts to Japanese pronunciation
- **Rewriting text** (e.g., spelling out numbers in target language): does not help for shared characters like 月, 日, 年

## Proposed Solution

Add an optional `language` parameter to the TTS request body:

```json
{
  "text": "十二月十五日",
  "reference_id": "model_id",
  "language": "zh"
}
```

This would allow the TTS engine to explicitly determine the pronunciation language for the given text, regardless of the reference voice's original language.

## Use Case

We are building a bilingual learning audio service that:
1. Transcribes audio (e.g., Japanese podcast) with speaker diarization
2. Clones each speaker's voice using Fish Audio
3. Translates segments to a target language (e.g., Chinese, Korean)
4. Synthesizes the translated text using the cloned voice

This cross-lingual voice cloning workflow works well for non-CJK target languages (e.g., Japanese to English), but breaks down for CJK to CJK translation due to shared characters.

## Environment

- Model: S2-Pro
- API endpoint: `/v1/tts`
- Source language: Japanese
- Target languages affected: Chinese (zh), Korean (ko)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add language parameter to TTS API for cross-lingual CJK voice cloning #1263

Problem

Examples

Workarounds Attempted

Proposed Solution

Use Case

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Add language parameter to TTS API for cross-lingual CJK voice cloning #1263

Description

Problem

Examples

Workarounds Attempted

Proposed Solution

Use Case

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions