Replies: 1 comment 15 replies
-
|
The final segment are just the result of heuristically splitting the transcription (a very long string) into segments (short strings). result.clamp_max().merge_all_segments().split_by_punctuation([('.', ' '), '。', '?', '?'])Sometime the models will produce no punctuations with default settings. If that occurs, you can try this: openai/whisper#194 (comment) |
Beta Was this translation helpful? Give feedback.
15 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I tried to create a python script to do the following:
The file format is described here: https://huggingface.co/datasets/flexthink/ljspeech
I tested different approaches, but non was a perfect solution.
Using pydub to detect silences fails to detect complete sentences, while using stable-ts start and end values will extract not extract the audio at perfect silence.
Is it possible to add a transcribe parameter allowing stable-ts to detect a full sentence as a segment and store the start and end values as perfect audio timestamps (detect silence and cut before and after the sentence starts).
Optional: Detect different speakers and add them to the metadata.csv
all the best
Flo
Beta Was this translation helpful? Give feedback.
All reactions