Skip to content

Commit 9564985

Browse files
author
Nicolas Fournier
committed
Move diarization
1 parent d1b182c commit 9564985

File tree

3 files changed

+96
-81
lines changed

3 files changed

+96
-81
lines changed

chapters/pre-recorded-stt/speaker-diarization.mdx

Lines changed: 0 additions & 80 deletions
This file was deleted.

chapters/settings/diarization.mdx

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
title: Diarization
3+
description: "Identify speakers and identify who spoke during the conversation."
4+
---
5+
6+
<Icon icon="check" iconType="solid" color="green" size="20" /> **Asynchronous STT** &nbsp; &nbsp; &nbsp;
7+
<Icon icon="xmark" iconType="solid" color="red" size="20" /> **Real-Time STT**
8+
9+
Speaker diarization is the process of detecting multiple speakers in an audio, and understanding which parts of the transcription each speaker said.
10+
11+
## Configuration
12+
13+
### Activation
14+
15+
We offer two diarization models: the **default** version and the **enhanced** version. Despite its name, the enhanced version does not guarantee better results in all cases. However, it may perform better for specific use cases. We encourage you to try both versions.
16+
17+
<Tabs>
18+
19+
<Tab title="Default">
20+
21+
Diarization is enabled by sending the `diarization` parameter in the transcription request.
22+
23+
```json
24+
{
25+
"audio_url": "<your audio URL>",
26+
"diarization": true
27+
}
28+
```
29+
30+
</Tab>
31+
32+
<Tab title="Enhanced">
33+
34+
Enhanced diarization is enabled in the `diarization_config` parameter.
35+
36+
```json
37+
{
38+
"audio_url": "<your audio URL>",
39+
"diarization": true,
40+
"diarization_config" : {
41+
"enhanced": true
42+
}
43+
}
44+
```
45+
46+
</Tab>
47+
48+
</Tabs>
49+
50+
<Note>
51+
Diarization has the following limitations:
52+
* Default diarization is supported for audio files of up to 135 minutes. <br/>
53+
* Enhanced diarization can handle longer audio files but does not support video files.
54+
</Note>
55+
56+
### Improving diarization accuracy
57+
58+
You can improve the accuracy of diarization by providing hints about the expected number of speakers or specifying lower and upper bounds. **These parameters serve as hints, not strict constraints.** The actual number of speakers detected by the model may differ from the values provided.
59+
60+
API reference is available [here](https://docs.gladia.io/api-reference/v2/pre-recorded/init#body-diarization-config).
61+
62+
| Key | Type | Description |
63+
| --- | --- | --- |
64+
| `diarization_config.number_of_speakers` | number | Instruct the model to detect an exact number of speakers in the audio. |
65+
| `diarization_config.min_speakers` | number | Instruct the model to detect no less than this number of speakers in the audio. |
66+
| `diarization_config.max_speakers` | number | Instruct the model to detect no more than this number of speakers in the audio. |
67+
68+
<Note>
69+
Enhanced diarization only supports the `number_of_speakers` parameter, which can be set to a value of either 1 or 2.
70+
</Note>
71+
72+
## Results
73+
74+
When diarization is enabled, each utterance includes a `speaker` field, which contains an index representing the speaker. <br/>
75+
Speakers are assigned indexes based on their order of appearance (e.g., the first speaker is speaker 0, the second is speaker 1, ...).
76+
77+
```json
78+
{
79+
"transcription": {
80+
"utterances": [
81+
{
82+
"words": [...],
83+
"text": "it says you are trained in technology.",
84+
"language": "en",
85+
"start": 0.7334100000000001,
86+
"end": 2.364,
87+
"confidence": 0.8914285714285715,
88+
"channel": 0,
89+
"speaker": 0,
90+
},
91+
...
92+
]
93+
}
94+
}
95+
```

mint.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,6 @@
168168
"pages": [
169169
"chapters/pre-recorded-stt/getting-started",
170170
"chapters/pre-recorded-stt/features",
171-
"chapters/pre-recorded-stt/speaker-diarization",
172171
"chapters/pre-recorded-stt/migration-from-v1"
173172
]
174173
},
@@ -186,6 +185,7 @@
186185
"chapters/settings/language-options",
187186
"chapters/settings/vocabulary-spelling",
188187
"chapters/settings/formatting",
188+
"chapters/settings/diarization",
189189
"chapters/settings/word-timestamps",
190190
"chapters/settings/custom-metadata"
191191
]

0 commit comments

Comments
 (0)