Multichannels section

Nicolas Fournier · Nicolas Fournier · commit b50174e9a3b7 · 2025-01-24T18:05:40.000+01:00
diff --git a/chapters/live-stt/features.mdx b/chapters/live-stt/features.mdx
@@ -5,7 +5,8 @@ description: "Features overview of Gladia's Real-Time speech-to-text (STT) API."
 
 | **Setting**                                                                       | **Description**                                                                                            |
 |-----------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
-| [Language(s)](/chapters/settings/language-options) | Configure the model languages and/or enable multi-languages transcription. |
+| [Media Input](/chapters/settings/media) | Define audio media's parameters sent to the API. |
+| [Language(s)](/chapters/settings/language-options) | Configure the model language and/or enable multi-languages transcription. |
 | [Custom Vocabulary](/chapters/settings/vocabulary-spelling) | Enhance the transcription precision of words you know.|
 | [Word-level Timestamps](/chapters/settings/word-timestamps) | Know the exact timestamp for each word, giving you a more precise transcription. |
 | [Custom Metadata](/chapters/settings/custom-metadata) | Add metadata to track and filter your requests. |
@@ -15,29 +16,4 @@ description: "Features overview of Gladia's Real-Time speech-to-text (STT) API."
 |-----------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
 | [Sentiment and Emotion Analysis](/chapters/audio-intelligence/pages/sentiment-analysis) | Extract sentiments and emotions from the audio, like confusion or interest.                                |
 | [Name Entity Recognition](/chapters/audio-intelligence/pages/named-entity-recognition)  | Automatically identifies and categorizes key information in the audio, like phone number or email address. |
-| [Summarization](/chapters/audio-intelligence/pages/summarization)  | Get important information from your conversation. This analysis is performed after the real-time transcription is stopped. |
-
-
-
-## IN PROGRESS bellow this part
-
-
-<Note>All the configuration properties described below are defined in the [POST /v2/live endpoint](/api-reference/v2/live/init).</Note>
-
-
-## Multiple channels
-
-If you have multiple channels in your audio stream, specify the count in the configuration:
-
-```json
-{
-  "channels": 2
-}
-```
-
-Gladia's real-time API will automatically split the channels and transcribe them separately.
-For each utterance, you'll get a `channel` key corresponding to the channel the utterance came from.
-
-<Warning>
-  Transcribing an audio stream with multiple channels will be billed exponentially. For example, an audio stream with 2 channels will be billed as double the audio duration, even if the channels are identical.
-</Warning>
+| [Summarization](/chapters/audio-intelligence/pages/summarization)  | Get important information from your conversation. This analysis is performed after the real-time transcription is stopped. |
diff --git a/chapters/pre-recorded-stt/features.mdx b/chapters/pre-recorded-stt/features.mdx
@@ -60,17 +60,3 @@ With the given example, `subtitles` will contains 2 items of shape:
   "subtitles": "1\n00:00:00,210 --> 00:00:04,711....." // subtitles
 }
 ```
-
-
-## Dual-channel or Multiple channels transcription
-
-If you have multiples channels in your audio file with different content each, Gladia API automatically transcribe them.
-In the transcription result, you will get for each utterances a `channel` key corresponding to the channels the transcription
- came from.
-
-<Warning>
-Sending an audio with 2 different channels (that does not contains the same audio data), will be billed twice as 2 different audios.
-If your audio has multiple channels but has the same audio content on each channels, it will only billed once.
-
-**TLDR**: We charge every unique channel in an audio file, we do not charge if channels are duplicates.
-</Warning>
diff --git a/chapters/settings/media.mdx b/chapters/settings/media.mdx
@@ -0,0 +1,93 @@
+---
+title: Media Input
+description: "Define audio media's parameters sent to the API."
+---
+
+## Multi-channels
+
+<Icon icon="check" iconType="solid" color="green" size="20" /> **Asynchronous STT** &nbsp; &nbsp; &nbsp;
+<Icon icon="check" iconType="solid" color="green" size="20" /> **Real-Time STT**
+
+
+<Tabs>
+
+<Tab title='Asynchronous STT'>
+
+### Configuration
+
+You have nothing to configure. If you have multiple channels in your audio file with different content, the API will automatically transcribe each of them.
+
+### Results
+
+Each utterance will include a `channel` key, indicating the channel from which the utterance came from.
+
+```json
+{
+  "utterances": [
+    {
+      "text": "Vi tester en ganske kort melding.",
+      "language": "no",
+      "start": 1.02077,
+      "end": 3.9749399999999997,
+      "confidence": 0.86,
+      "channel": 0,
+      "speaker": 0,
+      "words": [...]
+    },
+    {...}
+  ],
+}
+```
+
+<Note>
+The cost of transcribing an audio with multiple channels increases proportionally to the number of channels. For instance, a 2-channel audio will be billed double the audio duration, unless both channels are identical.
+</Note>
+
+</Tab>
+
+<Tab title='Real-Time STT'>
+
+### Configuration
+
+If you have multiple channels in your audio stream, specify it in the `channels` parameter.
+
+```json
+{
+  "channels": 2
+}
+```
+
+### Results
+
+Each utterance will include a `channel` key, indicating the channel from which the utterance came from.
+
+```json Real-Time
+{
+  {
+    "type": "transcript",
+    "session_id": "...",
+    "created_at": "2025-01-17T09:01:30.197Z",
+    "data": {
+      "id": "00_00000006",
+      "is_final": true,
+      "utterance": {
+        "text": "Bye.",
+        "start": 22.539534999999994,
+        "end": 22.729984999999996,
+        "language": "en",
+        "confidence": 1,
+        "channel": 0,
+        "words": [...]
+      }
+    }
+  },
+}
+```
+
+<Note>
+The cost of transcribing an audio stream with multiple channels increases proportionally to the number of channels. For instance, a 2-channel audio stream will be billed as double the audio duration, even if both channels contain identical audio.
+</Note>
+
+</Tab>
+
+</Tabs>
diff --git a/mint.json b/mint.json
@@ -182,6 +182,7 @@
     {
       "group": "Settings",
       "pages": [
+        "chapters/settings/media",
         "chapters/settings/language-options",
         "chapters/settings/vocabulary-spelling",
         "chapters/settings/formatting",

Original file line number	Diff line number	Diff line change
`@@ -182,6 +182,7 @@`
`182`	`182`	`{`
`183`	`183`	`"group": "Settings",`
`184`	`184`	`"pages": [`
	`185`	`+ "chapters/settings/media",`
`185`	`186`	`"chapters/settings/language-options",`
`186`	`187`	`"chapters/settings/vocabulary-spelling",`
`187`	`188`	`"chapters/settings/formatting",`