Move diarization

Nicolas Fournier · Nicolas Fournier · commit 956498571d0a · 2025-01-20T18:00:55.000+01:00
diff --git a/chapters/pre-recorded-stt/speaker-diarization.mdx b/chapters/pre-recorded-stt/speaker-diarization.mdx
diff --git a/chapters/settings/diarization.mdx b/chapters/settings/diarization.mdx
@@ -0,0 +1,95 @@
+---
+title: Diarization
+description: "Identify speakers and identify who spoke during the conversation."
+---
+
+<Icon icon="check" iconType="solid" color="green" size="20" /> **Asynchronous STT** &nbsp; &nbsp; &nbsp;
+<Icon icon="xmark" iconType="solid" color="red" size="20" /> **Real-Time STT**
+
+Speaker diarization is the process of detecting multiple speakers in an audio, and understanding which parts of the transcription each speaker said.
+
+## Configuration
+
+### Activation
+
+We offer two diarization models: the **default** version and the **enhanced** version. Despite its name, the enhanced version does not guarantee better results in all cases. However, it may perform better for specific use cases. We encourage you to try both versions.
+
+<Tabs>
+
+<Tab title="Default">
+
+Diarization is enabled by sending the `diarization` parameter in the transcription request.
+
+```json
+{
+  "audio_url": "<your audio URL>",
+  "diarization": true
+}
+```
+
+</Tab>
+
+<Tab title="Enhanced">
+
+Enhanced diarization is enabled in the `diarization_config` parameter.
+
+```json
+{
+  "audio_url": "<your audio URL>",
+  "diarization": true,
+  "diarization_config" : {
+    "enhanced": true
+  }
+}
+```
+
+</Tab>
+
+</Tabs>
+
+<Note>
+Diarization has the following limitations:
+* Default diarization is supported for audio files of up to 135 minutes. <br/>
+* Enhanced diarization can handle longer audio files but does not support video files.
+</Note>
+
+### Improving diarization accuracy
+
+You can improve the accuracy of diarization by providing hints about the expected number of speakers or specifying lower and upper bounds. **These parameters serve as hints, not strict constraints.** The actual number of speakers detected by the model may differ from the values provided.
+
+API reference is available [here](https://docs.gladia.io/api-reference/v2/pre-recorded/init#body-diarization-config).
+
+| Key | Type | Description |
+| --- | --- | --- |
+| `diarization_config.number_of_speakers` | number | Instruct the model to detect an exact number of speakers in the audio. |
+| `diarization_config.min_speakers` | number | Instruct the model to detect no less than this number of speakers in the audio. |
+| `diarization_config.max_speakers` | number | Instruct the model to detect no more than this number of speakers in the audio. |
+
+<Note>
+Enhanced diarization only supports the `number_of_speakers` parameter, which can be set to a value of either 1 or 2.
+</Note>
+
+## Results
+
+When diarization is enabled, each utterance includes a `speaker` field, which contains an index representing the speaker. <br/>
+Speakers are assigned indexes based on their order of appearance (e.g., the first speaker is speaker 0, the second is speaker 1, ...).
+
+```json
+{
+  "transcription": {
+    "utterances": [
+      {
+        "words": [...],
+        "text": "it says you are trained in technology.",
+        "language": "en",
+        "start": 0.7334100000000001,
+        "end": 2.364,
+        "confidence": 0.8914285714285715,
+        "channel": 0,
+        "speaker": 0,
+      },
+      ...
+    ]
+  }
+}
+```
diff --git a/mint.json b/mint.json
@@ -168,7 +168,6 @@
       "pages": [
         "chapters/pre-recorded-stt/getting-started",
         "chapters/pre-recorded-stt/features",
-        "chapters/pre-recorded-stt/speaker-diarization",
         "chapters/pre-recorded-stt/migration-from-v1"
       ]
     },
@@ -186,6 +185,7 @@
         "chapters/settings/language-options",
         "chapters/settings/vocabulary-spelling",
         "chapters/settings/formatting",
+        "chapters/settings/diarization",
         "chapters/settings/word-timestamps",
         "chapters/settings/custom-metadata"
       ]