Skip to content

Commit 302884d

Browse files
author
Nicolas Fournier
committed
Word level timestamp
1 parent 8466922 commit 302884d

File tree

4 files changed

+62
-73
lines changed

4 files changed

+62
-73
lines changed

chapters/live-stt/features.mdx

Lines changed: 1 addition & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ description: "Features overview of Gladia's Real-Time speech-to-text (STT) API."
77
|-----------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
88
| [Language(s)](/chapters/settings/language-options) | Configure the model languages and/or enable multi-languages transcription. |
99
| [Custom Vocabulary](/chapters/settings/vocabulary-spelling) | Enhance the transcription precision of words you know.|
10+
| [Word-level timestamps](/chapters/settings/word-timestamps) | Know the exact timestamp for each word, giving you a more precise transcription. |
1011

1112

1213
| **Audio Intelligence** | **Description** |
@@ -23,43 +24,6 @@ description: "Features overview of Gladia's Real-Time speech-to-text (STT) API."
2324
<Note>All the configuration properties described below are defined in the [POST /v2/live endpoint](/api-reference/v2/live/init).</Note>
2425

2526

26-
27-
## Word-level timestamps
28-
29-
Instead of just getting timestamps for when utterances begin and end, Gladia's real-time API provides **word-level timestamps**. This lets you know the exact timestamp for each word, giving you a more precise transcription, facilitating detailed analysis and more accurate synchronization with audio and video files.
30-
31-
To enable it, pass the following configuration:
32-
33-
```json
34-
{
35-
"realtime_processing": {
36-
"words_accurate_timestamps": true
37-
}
38-
}
39-
```
40-
41-
Under each utterance, you'll find a `words` property, like this:
42-
43-
```json
44-
{
45-
// ... other utterance properties
46-
"words": [
47-
{
48-
"word": "Split",
49-
"start": 0.21001999999999998,
50-
"end": 0.69015,
51-
"confidence": 1
52-
},
53-
{
54-
"word": " infinity",
55-
"start": 0.91021,
56-
"end": 1.55038,
57-
"confidence": 0.95
58-
},
59-
]
60-
}
61-
```
62-
6327
## Multiple channels
6428

6529
If you have multiple channels in your audio stream, specify the count in the configuration:

chapters/pre-recorded-stt/features.mdx

Lines changed: 0 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -16,41 +16,6 @@ Discover our state-of-the-art ASR model [ Whisper Zero now.](https://www.gladia.
1616
into the transcription process by including extra parameters in the transcription request.
1717

1818

19-
20-
## Word-level timestamps
21-
22-
Instead of just getting utterances start and end timestamps, **Gladia** Speech-to-text API provides by **default** the
23-
**Word-level timestamps** feature. It lets you know the exact timestamp for each word and give you a more precise transcription.
24-
This feature is particularly useful for detailed analysis, as it allows you to pinpoint the exact moment each word is spoken, facilitating
25-
a more accurate synchronization with audio or video files.
26-
27-
Under each utterance, you'll find a `words` property like this:
28-
29-
```json
30-
// other properties...
31-
"utterances": [
32-
{
33-
"words": [
34-
{
35-
"word": "Split",
36-
"start": 0.21001999999999998,
37-
"end": 0.69015,
38-
"confidence": 1
39-
},
40-
{
41-
"word": " infinity",
42-
"start": 0.91021,
43-
"end": 1.55038,
44-
"confidence": 0.95
45-
},
46-
...
47-
]
48-
}
49-
]
50-
```
51-
52-
53-
5419
## Export SRT or VTT caption files
5520

5621
You can export completed transcripts in both SRT and VTT format, which can be used for subtitles and captions in videos.
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
---
2+
title: Word-level timestamps
3+
description: "Get the exact timestamp for each word in your audio file."
4+
---
5+
6+
<Icon icon="check" iconType="solid" color="green" size="20" /> **Asynchronous STT** &nbsp; &nbsp; &nbsp;
7+
<Icon icon="check" iconType="solid" color="green" size="20" /> **Real-Time STT**
8+
9+
Instead of providing only the start and end timestamps of an utterance, Gladia API delivers precise timestamps for each individual word. This feature is useful for detailed analyses, enabling you to pinpoint the exact moment each word is spoken. It also facilitates synchronization with audio or video files for enhanced accuracy.
10+
11+
## Configuration
12+
13+
<Tabs>
14+
15+
<Tab title='Asynchronous STT'>
16+
17+
Word-level timestamps is always enabled for asynchronous STT.
18+
19+
</Tab>
20+
21+
<Tab title='Real-Time STT'>
22+
23+
World-level configuration is set within the `realtime_processing` object in your transcription request. API reference is available [here](https://docs.gladia.io/api-reference/v2/live/init#body-realtime-processing-words-accurate-timestamps).
24+
25+
```json
26+
{
27+
"realtime_processing": {
28+
"words_accurate_timestamps": true
29+
}
30+
}
31+
```
32+
33+
</Tab>
34+
35+
</Tabs>
36+
37+
## Results
38+
39+
Each utterance will contains a `words` property:
40+
41+
```json
42+
{
43+
// ... other utterance properties
44+
"words": [
45+
{
46+
"word": "Split",
47+
"start": 0.21001999999999998,
48+
"end": 0.69015,
49+
"confidence": 1
50+
},
51+
{
52+
"word": " infinity",
53+
"start": 0.91021,
54+
"end": 1.55038,
55+
"confidence": 0.95
56+
},
57+
]
58+
}
59+
```

mint.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,8 @@
185185
"pages": [
186186
"chapters/settings/language-options",
187187
"chapters/settings/vocabulary-spelling",
188-
"chapters/settings/formatting"
188+
"chapters/settings/formatting",
189+
"chapters/settings/word-timestamps"
189190
]
190191
},
191192
{

0 commit comments

Comments
 (0)