Skip to content

Commit b50174e

Browse files
author
Nicolas Fournier
committed
Multichannels section
1 parent 9564985 commit b50174e

File tree

4 files changed

+97
-41
lines changed

4 files changed

+97
-41
lines changed

chapters/live-stt/features.mdx

Lines changed: 3 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@ description: "Features overview of Gladia's Real-Time speech-to-text (STT) API."
55

66
| **Setting** | **Description** |
77
|-----------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
8-
| [Language(s)](/chapters/settings/language-options) | Configure the model languages and/or enable multi-languages transcription. |
8+
| [Media Input](/chapters/settings/media) | Define audio media's parameters sent to the API. |
9+
| [Language(s)](/chapters/settings/language-options) | Configure the model language and/or enable multi-languages transcription. |
910
| [Custom Vocabulary](/chapters/settings/vocabulary-spelling) | Enhance the transcription precision of words you know.|
1011
| [Word-level Timestamps](/chapters/settings/word-timestamps) | Know the exact timestamp for each word, giving you a more precise transcription. |
1112
| [Custom Metadata](/chapters/settings/custom-metadata) | Add metadata to track and filter your requests. |
@@ -15,29 +16,4 @@ description: "Features overview of Gladia's Real-Time speech-to-text (STT) API."
1516
|-----------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
1617
| [Sentiment and Emotion Analysis](/chapters/audio-intelligence/pages/sentiment-analysis) | Extract sentiments and emotions from the audio, like confusion or interest. |
1718
| [Name Entity Recognition](/chapters/audio-intelligence/pages/named-entity-recognition) | Automatically identifies and categorizes key information in the audio, like phone number or email address. |
18-
| [Summarization](/chapters/audio-intelligence/pages/summarization) | Get important information from your conversation. This analysis is performed after the real-time transcription is stopped. |
19-
20-
21-
22-
## IN PROGRESS bellow this part
23-
24-
25-
<Note>All the configuration properties described below are defined in the [POST /v2/live endpoint](/api-reference/v2/live/init).</Note>
26-
27-
28-
## Multiple channels
29-
30-
If you have multiple channels in your audio stream, specify the count in the configuration:
31-
32-
```json
33-
{
34-
"channels": 2
35-
}
36-
```
37-
38-
Gladia's real-time API will automatically split the channels and transcribe them separately.
39-
For each utterance, you'll get a `channel` key corresponding to the channel the utterance came from.
40-
41-
<Warning>
42-
Transcribing an audio stream with multiple channels will be billed exponentially. For example, an audio stream with 2 channels will be billed as double the audio duration, even if the channels are identical.
43-
</Warning>
19+
| [Summarization](/chapters/audio-intelligence/pages/summarization) | Get important information from your conversation. This analysis is performed after the real-time transcription is stopped. |

chapters/pre-recorded-stt/features.mdx

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -60,17 +60,3 @@ With the given example, `subtitles` will contains 2 items of shape:
6060
"subtitles": "1\n00:00:00,210 --> 00:00:04,711....." // subtitles
6161
}
6262
```
63-
64-
65-
## Dual-channel or Multiple channels transcription
66-
67-
If you have multiples channels in your audio file with different content each, Gladia API automatically transcribe them.
68-
In the transcription result, you will get for each utterances a `channel` key corresponding to the channels the transcription
69-
came from.
70-
71-
<Warning>
72-
Sending an audio with 2 different channels (that does not contains the same audio data), will be billed twice as 2 different audios.
73-
If your audio has multiple channels but has the same audio content on each channels, it will only billed once.
74-
75-
**TLDR**: We charge every unique channel in an audio file, we do not charge if channels are duplicates.
76-
</Warning>

chapters/settings/media.mdx

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
---
2+
title: Media Input
3+
description: "Define audio media's parameters sent to the API."
4+
---
5+
6+
## Multi-channels
7+
8+
<Icon icon="check" iconType="solid" color="green" size="20" /> **Asynchronous STT** &nbsp; &nbsp; &nbsp;
9+
<Icon icon="check" iconType="solid" color="green" size="20" /> **Real-Time STT**
10+
11+
12+
<Tabs>
13+
14+
<Tab title='Asynchronous STT'>
15+
16+
### Configuration
17+
18+
You have nothing to configure. If you have multiple channels in your audio file with different content, the API will automatically transcribe each of them.
19+
20+
### Results
21+
22+
Each utterance will include a `channel` key, indicating the channel from which the utterance came from.
23+
24+
```json
25+
{
26+
"utterances": [
27+
{
28+
"text": "Vi tester en ganske kort melding.",
29+
"language": "no",
30+
"start": 1.02077,
31+
"end": 3.9749399999999997,
32+
"confidence": 0.86,
33+
"channel": 0,
34+
"speaker": 0,
35+
"words": [...]
36+
},
37+
{...}
38+
],
39+
}
40+
```
41+
42+
<Note>
43+
The cost of transcribing an audio with multiple channels increases proportionally to the number of channels. For instance, a 2-channel audio will be billed double the audio duration, unless both channels are identical.
44+
</Note>
45+
46+
</Tab>
47+
48+
<Tab title='Real-Time STT'>
49+
50+
### Configuration
51+
52+
If you have multiple channels in your audio stream, specify it in the `channels` parameter.
53+
54+
```json
55+
{
56+
"channels": 2
57+
}
58+
```
59+
60+
### Results
61+
62+
Each utterance will include a `channel` key, indicating the channel from which the utterance came from.
63+
64+
```json Real-Time
65+
{
66+
{
67+
"type": "transcript",
68+
"session_id": "...",
69+
"created_at": "2025-01-17T09:01:30.197Z",
70+
"data": {
71+
"id": "00_00000006",
72+
"is_final": true,
73+
"utterance": {
74+
"text": "Bye.",
75+
"start": 22.539534999999994,
76+
"end": 22.729984999999996,
77+
"language": "en",
78+
"confidence": 1,
79+
"channel": 0,
80+
"words": [...]
81+
}
82+
}
83+
},
84+
}
85+
```
86+
87+
<Note>
88+
The cost of transcribing an audio stream with multiple channels increases proportionally to the number of channels. For instance, a 2-channel audio stream will be billed as double the audio duration, even if both channels contain identical audio.
89+
</Note>
90+
91+
</Tab>
92+
93+
</Tabs>

mint.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,7 @@
182182
{
183183
"group": "Settings",
184184
"pages": [
185+
"chapters/settings/media",
185186
"chapters/settings/language-options",
186187
"chapters/settings/vocabulary-spelling",
187188
"chapters/settings/formatting",

0 commit comments

Comments
 (0)