- fixed repetition of failed alignment word (e46981e)
- added support for MLX-Whisper (c125472)
- added support for
faster_whisper.BatchedInferencePipeline(97b4bcd) - updated README.md (c125472, 97b4bcd)
- updated Whisper to 20250625 (52533aa)
- updated HuggingFace Transformers to <=4.47.1 (d4334d5)
- fixed
fill_in_gaps()(650cc4b) - fixed
align()failing skip audio segment without speech (cc2d6bf) - fixed
denoiser="demucs"+suppress_silence=Falsecausing pytorch to numpy conversion error (fefaf46) - fixed
refine()failing due to missing word tokens/probabilities in inputresult(78a223f) - added alignment and refinement support for Hugging Face models (751b041)
- updated README.md (a44ebf3, 78a223f)
- updated Hugging Face models transcription to always return language (78a223f, 4232dd4)
- updated Hugging Face model loading process to warn BetterTransformer conversion failure (4232dd4)
- improved compatibility of Hugging Face models with certain old versions of PyTorch (4232dd4)
- fixed alignment skipping sections of speech (fc0d0da)
- fixed
'last_ts' referenced before assignmenterror for alignment (fc0d0da)
- fixed
align_words()throwingTypeErrorforlanguagewhentextis not aWhisperResult(fe15241) - fixed
failure_thresholdforAlignerandalign()(ee64fd3) - fixed
locate()compatibility with Whisper 20240930+ (7338f85) - changed
align_words()to no longer requirelanguagefor EN models (fe15241) - updated
languageto ignore case and accept language labels and codes (cd9232e)
- fixed
align()throwingTypeErrorforlanguagewhentextis not aWhisperResult(8e12ee4) - fixed
original_split=Truenot working (c446d43) - fixed alignment incorrectly encoding
textresult from another model (c446d43) - changed
align()to no longer requirelanguagefor EN models (8e12ee4)
- deprecated
transcribe_stable()for Faster-Whisper models and replaced it withtranscribe()(de0b42e) - removed redundant copying of encoder output in
DecodingTaskStable._get_audio_features()(9fefdb8) - refactored
non_whisper.pyinto sub-packagenon_whisper(e6c44f9) - refactored contents of
non_whisper.pytotranscribe.pyinnon_whisper(e6c44f9) - refactored
align()andalign_words()to usenon_whisper.alignment.Aligner(2696a8b) - refactored
refine()to usenon_whisper.refinement.Refiner(2696a8b) - added
adjust_gaps()toresult.WhisperResult(4f7cff2) - added
convert_to_segment_level()toresult.WhisperResult(4f7cff2) - added
custom_operation()toresult.WhisperResult(08421e2) - added
recordtoremove_segment()andremove_word()(08421e2) - added
reassign_idstoWordTiming.add(),WordTiming.split()(08421e2) - added
clip_timestampstotranscribe()(9fefdb8) - added
align_words()(c176ecd) - added
ignore_special_periodstosplit_by_gap(),split_by_punctuation(),split_by_length(),split_by_duration()(1b00156) - added
ignore_special_periods()toresult.WhisperResult(1b00156) - added
recordtomerge_all_segments()(1b00156) - added
alignment.pyandrefinement.pytonon_whisper(e6c44f9) - added
non_whisper.alignment.Aligner(e6c44f9) - added
non_whisper.refinement.Refiner(e6c44f9) - added
options.py(e6c44f9) - added
refine()to Faster-Whisper model instances [de0b42e] - fixed
vad=Truememory leak (4711a01) - fixed
AudioLoaderAttributeError for_denoised_save_path(9fefdb8) - fixed
align()compatibility issue with new Faster-Whisper versions (fe00aaf, c176ecd) - fixed
merge_all_segments()redundant records in `regroup_history (1b00156) - fixed
denoiser='noisereduce'producing audio tensor with nan values (852b39c) - fixed
ignore_special_periodsto filter with the actual word instead ofWordTIming.__repr__(39e503d) - fixed
align()unable to initialize tokenizer whentextis result from faster-whisper model andlanguageis unspecified (25d9e13) - fixed
transcribe()causingTqdmWarning: clamping frac to range [0, 1](10c26b1) - changed
progress_callbackfortranscribe()andalign()to pass positional arguments (9fefdb8, c176ecd) - changed default regroup algorithm (1b00156, 39e503d)
- changed
even_split = Trueto not ignore locked words (1b00156) - changed
clamp_max()to only ignore segments with less than 2 words (39e503d) - updated
WhisperResult.lock()to accept empty stringstartswithandendswith(4f7cff2) - updated README.md (4f7cff2, 7dab171, 9fe1bf5, c176ecd, de0b42e)
- updated non-regrouping chainable methods to always save to
regroup_history(08421e2) - updated
AudioLoaderwith ability to load only specific portions of the audio source (9fefdb8) - updated
AudioLoaderto provide more informative error messages when yt-dlp fails to load an URL (9fefdb8) - updated
refine()to automatically add missing word-timestamps (c176ecd) - updated
refine()to automatically encode text with missing tokens inresult(c176ecd) - updated
ignore_special_periodsto not consider words that end with ".." as special (39e503d) - updated
align()to supportpresplitfor Faster-Whisper models (2696a8b) - updated
refine()to support Faster-Whisper models; note thatrefine()will slower on Faster-Whisper models (2696a8b)
- added
dynamic_headstotranscribe()andalign()(32235fa) - added
pipeline_kwargstoload_hf_whisper()(024d7dc) - added
"large-v3-turbo"and"turbo"toHF_MODELS(024d7dc) - updated Whisper requirement to >=20230314,<=20240930 (453013c, df8dace)
- updated Whisper compatibility warning message (453013c, df8dace)
- updated compatibility with Whisper v20240930 (df8dace)
- updated
align()andtranscribe_stable()compatibility with latest Faster-Whisper commit (024d7dc)
- deprecated
vad_onnx(b309530) - added optional dependencies for Faster Whisper and Hugging Face (c541169)
- added
nonspeech_skip(888181f) - fixed #393 (1ee47ce)
- fixed
stabilization.utils.mask2timing()to handle edge cases (e0e7183) - fixed
suppress_silence=Falseperforming unnecessary compute whenvad=True(888181f) - fixed typos in docstrings (e0e7183)
- updated
refine()docstring inREADME(3bc76b9) - updated
vadto accept adictof keyword arguments for loading VAD (b309530)
- added
pad()toresult.WhisperResult(689fe5e) - added
newlinetomerge_by_gap()andmerge_by_punctuation()(689fe5e) - fixed
verboseforadjust_by_silence()(f53f2ee) - fixed adjustment progress bar in
non_whisper.transcribe_any()(48d70a8) - fixed error from using
tag/--tagwhen output format is VTT andword_level=True(3997ef1) - fixed segment merging methods not working when the result contains only segment-level timestamps (689fe5e)
- updated
merge_by_gap()andmerge_by_punctuation()docstrings withnewline(3ab74e7)
- changed SRT to start from index 1 (9f8db52)
- changed
reset()to be consistent for results produces by alltranscribe()variants (864b76c) - fixed #357 (98923ea)
- fixed
refine()not working whenverboseis notTrue(864b76c) - fixed progress bar warning for
refine()(864b76c)
- fixed #353 (66f8d13)
- fixed
align()error when audio segment contains no detectable nonspeech/silent sections (6d9a1ef) - fixed
gap_paddingcausing unpredictable gaps or delays in the final timestamps foralign()(6d9a1ef) - updated
align()(6d9a1ef)
- added
min_silence_durtoalign()and all variants oftranscribe()(e2f9458) - added
pad_or_trim()towhisper_compatibility(c4d42f2) - changed
align()to ignore compatibility issues for Fast-Whisper models (c4d42f2) - changed
align()to prioritize new timestamps within rounding error (5ca7ca5) - changed
align()to prioritize timestamps that least overlap nonspeech timings (e2f9458) - changed silence suppression to be less aggressive (e2f9458)
- changed silence suppression to treat nonspeech sections that overlap a word as individual sections (5ca7ca5)
- dropped Whisper dependency for
stable-ts-whisperless(c4d42f2) - fixed
result.WordTIming.suppress_silence()by undoing changes in e2f9458 (0546d76) - fixed discrepancy between
textand output foralign()(e2f9458) - changed default of
align()topresplit=Falseon faster-whisper models (850a19f) - updated
README.mdwith setup instructions forstable-ts-whisperless(c4d42f2) - updated
use_word_position=Trueto also take into account the index of each word (5ca7ca5)
- deprecated
suppress_attention(5513609) - deprecated
ts_numandts_noise(5513609) - added noisereduce as a supported denoisers (03bb83b)
- added
enginetoload_model()(5513609) - added
extra_models, toalign()andtranscribe()(5513609) - added
presplitandgap_paddingtoalign()(5513609) - fixed docstring of
adjust_by_silence()(5513609) - fixed
dfnetdenoiser model to use specifieddevice(5513609) - fixed error from
progress=Truewhendenoiser='noisereduce'(5513609) - fixed incorrect titles when downloading audio with yt-dlp(5513609)
- changed
'demucs'and'dfnet'denoisers to denoise in 2 channels whenstream=False(5513609) - improved word timing by making
gap_paddingmore effective (5513609)
- fixed inaccurate progress bar in
result.WhisperResult.suppress_silence()(ad013d7) - replaced
update_all_segs_with_words()in therefine()withreassign_ids()(ad013d7) - updated
--alignto treat the argument as plain-text if the argument starts with'text='(ad013d7)
- added
--persist/-pto CLI (177bcc4) - added
suppress_attentiontotranscribe()andalign()for original Whisper (177bcc4) - fixed
align()failing to predict nonspeech timings after skipping a nonspeech section (424f484) - fixed typo (#324) (dbee5c5)
- changed
WhisperResultto allow initialization without data (00ad4b4) - fixed
Segment.copy()failing to initializeWordTimingwhennew_words=Noneandcopy_words=False(00ad4b4) - fixed
WhisperResult.durationto return0.0if result contains no segments (00ad4b4) - fixed
WhisperResult.has_wordsto returnFalseif result contains no segments (00ad4b4)
- fixed
Whisper.fill_in_gaps()(cbbad76) - removed
end>=startrequirement forSegment(cbbad76) - updated warning message for out of order timestamps (cbbad76)
- deprecated
Segment.update_seg_with_words()andWhisperResult.update_all_segs_with_words()(ff89e53) - changed
start,end,text,tokensofSegmentto properties (ff89e53) - deprecated and replace
WordTiming.round_all_timestamps()withround_ts=Trueat initialization (ff89e53) - added progress bar for timestamps adjustments (ff89e53)
- speed up splitting and merging of segments (ff89e53)
- removed redundant parts of the default regrouping algorithm (ff89e53)
- added
pipelinetostable_whisper.load_hf_whisper()(c356491) - changed
language,task,batch_sizeto optional parameters for theWhisperHF.transcribe()(c356491) - fixed English models not working for
WhisperHF(c356491) - fixed
get_device()for'mps'(53272cb)
WhisperHF.transcribe()can now take generation parameters supported byTransformers(133f323)- added logic to replace
Nonetimestamps returned by Hugging Face Whisper models (8bbe0c5) - changed
whisper_word_level.hf_whisper.load_hf_pipe()model loading method(a684fb4)
- added DeepFilterNet (https://github.com/Rikorose/DeepFilterNet) as supported denoiser (3fafd04)
- added Whisper on Hugging Face Transformers to CLI (3fafd04)
- fixed CLI throwing OSError when input is a URL and --output is not specified (3fafd04)
- fixed
WhisperHF.transcribe()unable to load when audio is URL or certain formats (3fafd04)
- added support for Whisper on Hugging Face Transformers (9197b5c)
- fixed non-speech suppression not working properly for
transcribe_any()(9197b5c)
- changed default to
dtype=numpy.int32for all Numpy int arrays (3886bc6)
- removed
shell=Truein.audio.utils.get_metadata()(e8f72a3)
- added "「" to
prepend_punctuationsand "」" toappend_punctuations(9968a45) - added
AudioLoaderclass for handling general audio loading (9968a45) - added
NonSpeechPredictorclass for handling non-speech detection (9968a45) - added
default.pyto hold global default states (9968a45) - added
failure_thresholdtoalign()(9968a45) - added
streamto functions that useAudioLoaderinternally (9968a45) - added progress bars for VAD and Demucs operations (9968a45)
- changed text normalization for
align()(6d0746c) - changed
WhisperResultto ignore segments with no words (6d0746c) - changed
nonspeech_errordefault from 0.3 to 0.1 for all functions (9968a45) - changed
nonspeech_skipdefault from 3.0 to 5.0 foralign()(9968a45) - changed
use_word_positionbehavior (9968a45) - changed to load Demucs into cache for reuse by default (9968a45)
- deprecated and replaced
demucsanddemucs_optionswithdenoiseranddenoiser_options(9968a45) - dropped
ffmpeg-pythondependency (9968a45) - dropped dependencies: more-itertools, transformers (9968a45)
- fixed
align()producing empty word slices (6d0746c) - fixed
refine()exceeding the max token count (#297) (f6d61c2) - fixed issues in
transcribe_any()caused by unspecified samplerate (9968a45) - fixed
vad=Truecausing first word of segment to be grouped with previous segment (9968a45) - refactored
audio.py,stabilization.py,whisper_word_level.pyinto subpackages (9968a45) - removed
demucs_output(9968a45)
- added
output_demo.mp4(395c8a9) - fixed
align()throwingUnsortedException(f9ca03b) - fixed
original_split=Truefailing when there are more than one consecutive newlines (f9ca03b) - fixed (
align()IndexError)(#292 (comment)) (f9ca03b)
- added
trust_repo=Truefor loading Silero-VAD (a6b2b05) - added
'master'to the branch for loading Silero-VAD (a6b2b05) - fixed
align()failing for faster whisper with certain languages (677f233) - fixed
result.WhisperResult.apply_min_dur()andresult.Segment.apply_min_dur()to work as intended (be2985e) - removed
resampling_method="kaiser_window"for all calls oftorchaudio.functional.resample()(a6b2b05)
- updated
align()logic (738fd98) - added
nonspeech_skiptoalign()(738fd98) - added
show_unsortedtoresult.WhisperResult.__init__()andresult.WhisperResult.raise_for_unsorted()(738fd98) - added
use_word_positionto methods that support non-speech/silence suppression (738fd98) - fixed
result.WhisperResult.force_order()to handle data with multiple consecutive unsort timestamps (738fd98) - fixed empty segment removal to work as intend for
result.WhisperResult(ef0a87e) - updated
README.mdto directly included the docstrings instead of hyperlinks (738fd98) - updated
result.save_as_json()to includeensure_ascii=Falseas default (738fd98) - added
kwargstoresult.save_as_json()(738fd98) - updated demo videos (3524aa2)
- added
nonspeech_sectionsproperty toresult.WhisperResult(191674b) - added
nonspeech_errorfor silence suppression (191674b) - changed
min_word_durbehavior for silence suppression (191674b) - changed silence suppression behavior (191674b)
- updated
README.md(191674b)
- fixed
result.WhisperResult.split_by_punctuation()not working ifmin_words/min_chars/min_durare unspecified (d51edb6)
- added
show_regroup_history()toresult.WhisperResult(df4a199) - added new attribute,
regroup_history, to.result.WhisperResult(df4a199) - added
min_words,min_chars,min_durtoresult.WhisperResult.split_by_punctuation()(df4a199) - updated
README.md(e86c571)
- added
get_content_by_time()toresult.WhisperResult(900797a) - added
get_result()toresult.Segment(900797a) - added
get_segment()toresult.WordTiming(900797a) - added
text_ouput.result_to_txt()/result.WhisperResult.to_txt()(900797a) - added editing methods to
result.WhisperResult:remove_word(),remove_segment(),remove_repetition(),remove_words_by_str(),fill_in_gaps()(900797a) - added editing methods to list of 'method keys' in
result.WhisperResult.regroup()(900797a) - changed
result.Segment.to_display_str()to enclose segment text in double quotes (900797a) - implemented
__getitem__and__delitem__forresult.Segmentandresult.WhisperResult(900797a) - updated docstrings of
whisper_word_level.load_model()andwhisper_word_level.load_faster_whisper()(900797a)
- added
result.WhisperResult.split_by_duration()(71b9f1f) - fixed
newline=Trueforresult.WhisperResult._split_segments()(71b9f1f) - fixed docstring of
result.WhisperResult.split_by_length()(71b9f1f) - updated Whisper to v20231117 (71b9f1f)
- added
--faster_whisper,-fwto CLI (a038ad1) - added
--locate,-lcto CLI (a038ad1) - changed
alignment.align()to be compatible with faster-whisper (a038ad1) - changed
verbosebehavior foralignment.locate()(a038ad1) - fixed inconsistent syntax and typo in docstrings (a038ad1)
- removed assertions for checking timestamp order when using
__add__()withresult.Segmentorresult.WordTiming(a038ad1)
- added
newlinetosplit_by_gap(),split_by_punctuation(),split_by_length()(b336735) - added
progress_callbacktowhisper_word_level.load_faster_whisper.faster_transcribe()(b336735) - fixed #241 (5c512a1)
- refactored
_COMPATIBLE_WHISPER_VERSIONS,_required_whisper_ver,warn_compatibility_issues()(b336735) - updated
README.md(3dfbd72) - updated
--modelfor CLI to be compatible with checkpoint paths (b336735) merge_all_segments()with faster logic (b336735)- updated
verbosefor.whisper_word_level.load_faster_whisper.faster_transcribe()(b336735) - updated whisper version to
v20231106(b336735)
- added
avg_prob_thresholdtowhisper_word_level.transcribe_stable()(58ece35) - added
fast_modetoalignment.align()(58ece35) - added
utils.UnsortedException(eb00d29) - added
word_dur_factorandmax_word_durtoalignment.align()(58ece35) - changed
check_sortedforresult.WhisperResultto also accept a path (eb00d29) - changed
clip_startdefault toNoneforresult.WhisperResult.clamp_max()(58ece35) - corrected docstrings of
suppress_silenceandsuppress_word_ts(58ece35) - fixed
timing.find_alignment_stable()returning negative timestamps (58ece35)
- added
alignment.locate()(a777206) - added
utils.format_timestamp()andutils.make_safe()(a777206) - added
utils.safe_print()(a777206) - added
demucs,demucs_options,only_voice_freqtoalignment.refine()(a777206) - added
to_display_str()toresult.Segment(a777206) - added
demucs_optionstowhisper_word_level.load_faster_whisper.faster_transcribe()(a777206) - updated
--output/-o(a777206) - changed
audioto always expected to be 16kHz fortorch.Tensorornumpy.ndarray(a777206) - fixed
alignment.align()failing iftextaresult.WhisperResultwithout tokens (a777206) - fixed
original_split=Trueby replacing line breaks with space (97a316d) - fixed
result_to_ass()failing to return to base color when usingtag(83ae509) - improved efficiency of segment splitting for
alignment.align()whenoriginal_split=True(a777206) - refactored the audio preprocessing into
audio.prep_audio()(a777206) - removed
_is_whisper_repo_versionfromutils.py(a777206) - renamed
original_spittooriginal_splitforalignment.align()(a777206) - set
action="extend"for all CLI keyword arguments that take multiple values (a777206) - changed
demucsto also accept a Demucs model instance(a777206) - deprecated
time_scale,input_sr,demucs_output,demucs_device(a777206) - updated docstrings (a777206)
- updated
alignment.align()to raise warning on failure (b9ac041) - changed
languageinto a required parameter (b9ac041) - fixed
alignment.align()endlessly looping (b9ac041)
- changed
abs_dur_changedefault toNone(dd1452e) - changed
abs_prob_decreasedefault to0.5(dd1452e) - changed
alignment.refine()allow durations to increase (dd1452e) - changed
rel_prob_decreasedefault to0.3(dd1452e) - changed
rel_rel_prob_decreaseto optional (dd1452e) - changed the usage of original probability in
alignment.refine()(dd1452e) - fixed CLI not using
decode_options(9aba3dc) - fixed
adjust_by_silence()throwingTypeError(92d51b9) - updated
README.md3643092)
- added
--alignto CLI (c90ff06) - added
alignment.refine()for refining timestamps (138cb6b) - added
--refineand--refine_optionto CLI (138cb6b) - added
segment_idandidtoresult.WordTiming(138cb6b) - added description to transcription progress bar (138cb6b)
- fixed
align()not working whentextis aresult.WhisperResult(138cb6b) - fixed
transcribe()throwing error ifsuppress_silence=False(138cb6b) - updated
README.md(c90ff06)
- fixed
--debugnot showing the first option (857df9a) - fixed
demucsandonly_voice_freqfortranscribe_stable()(7f62a9d) - fixed
demucsfortranscribe_minimal()(857df9a) - fixed
only_voice_freqfortranscribe_minimal()(7f62a9d) - fixed progress bar for faster-whisper (7f62a9d)
- updated
transcribe_minimal()to accept more options (857df9a) - updated
transcribe_stable()for faster-whisper models to accept more options (7f62a9d)
- added
'us'as method key toWhisperResult.regroup()(da33bf5) - added
--demucs_option,--model_option,--transcribe_option,--save_optionto CLI (da33bf5) - added
--transcribe_methodto CLI (da33bf5) - added
Segment.words_by_lock(),WhisperResult.all_words_by_lock()(da33bf5) - added
striptoWhisperResult.lock()(e98c3d6) - fixed docstring of
WhisperResult.lock()(05bba74) - improved
--debugfor CLI (da33bf5) - improved
even_split=TrueforWhisperResult.split_by_length()(da33bf5) - updated docstring of
WhisperResult.split_by_length()(da33bf5)
- added
lock()toWhisperResult(384fc3c) - added
'l'as method key toWhisperResult.regroup()(384fc3c) - added progress bar to transcription with faster-whisper (5ac6f5e)
- updated
--output_formatto accept multiple formats (384fc3c) - updated
WhisperResult.reset()to match its initialization (384fc3c) - updated
regroup()to parseregroup_algointo dict (384fc3c)
- added
check_sortedtoWhisperResult(4054ca1) - added
check_sortedtotranscribe_any()(07eaf9e) - added
round_all_timestamps()toresult.Segmentandresult.WordTiming(4a7e52b) - changed default to
word_timestamps=Trueforfaster_transcribe()(4a7e52b) - changed
raise_for_unsorted()logic (4a7e52b) - fixed
WhisperResult.force_order()to work as intended (4a7e52b)
- added
token_steptoalign()(ac3b38c) - delete
_demodirectory (b592731) - fixed #205 (ac3b38c)
- updated
README.md(d0340ef, ffa05a4)
- added
Whisper.adjust_by_result()(6da3dd8) - added
alignment.align()(6da3dd8) - added
load_faster_whisper()(6da3dd8) - fixed
encode_video_comparison()unable to encode more than two subtitle files (6da3dd8) - fixed
verbosenot working fortranscribe_minimal()(6da3dd8) - refactored compatibility warning into
warn_compatibility_issues()inutils.py(6da3dd8) - refactored post-inference silence suppress into
WhisperResult.adjust_by_silence()(6da3dd8)
- added
demucs_optionstotranscribe()(91cf2b1) - added
ignore_compatibilitytotranscribe()(91cf2b1) - changed compatibility warning to distinguish between mismatch version number and repo version (91cf2b1)
- changed heuristic for identifying Whisper version number to avoid false positives (91cf2b1)
- added
transcribe_minimal()(ef8a7f1) - added
force_ordertoresult.WhisperResult(ef8a7f1) - added
max_instant_wordstotranscribe()(ef8a7f1) - added
progress_callbacktotranscribe()(ef8a7f1) - changed default to
clip_start=TrueforWhisperResult.clamp_max()(ef8a7f1) - added logic to check if the installed Whisper version is compatible (e53f4be)
- fixed
tagforresult_to_ass()to work as intended (ea8cac8)
- added logic to ensure ascending timestamps in
result.WhisperResult(fd78cd7) - updated default regroup algorithm (fd78cd7, 77dcfdf)
- updated long form transcription logic (fd78cd7)
- fixed skipping words (77dcfdf)
- avoid computing higher temperatures on
no_speechsegments (fd78cd7) - removed any segments that contains only punctuations (fd78cd7)
- removed segments with 50%+ instantaneous words (fd78cd7)
- updated
README.md(f5b4c22)
- allow
regroup_algoto be bool forregroup()(4984163)
- added
even_splittosplit_by_length()(7b867d6) - changed default behavior of
split_by_length()(7b867d6) - changed default to
verbose=Falseforclamp_max()(7b867d6)
- ignore
min_word_durwhen missing words timestamps (e93c280) - fixed
min_word_durnot working for word timestamps (e93c280)
- added
clamp_max()toWhisperResultandWordTiming(bfe93ab) - added
cmas method key forclamp_max()(bfe93ab) - added
non_whisper.transcribe_any()(789bb54) - changed default to
suppress_ts_tokens=False(789bb54) - fixed hyperlinks in
README.mdnot linking to the latest commit (87636ef) - fixed incorrect line numbers for docstring hyperlinks (52b8b7a)
- fixed
--regroupdefault (af5579e)
- added string form custom regrouping algorithm (cc352cd)
- fixed #153 (9e3ba72)
- removed max limit on audio threshold) (9e3ba72)
- updated
non-whisper.ipynb(da3721b, 7866462)
- changed
result.WhisperResultto only require necessary data to initialize (cdf3ea9) - added
--karaoketo CLI (cdf3ea9) - updated
README.md(0635e15, 2f094f8, fb23c27)
- added support for TSV output format (d30d0d1)
- changed to VTT and ASS default output to use more efficient formats (d30d0d1)
- fixed non-VAD suppression not working properly (d30d0d1)
- improved language detection (d30d0d1)
- added logic for loading audio with yt-dlp (8960922)
- added
only_ffmpegtotranscribe()and CLI (8960922) - added
shell=Trueto subprocess call (a8df3b5)
- added classes:
SegmentMatchandWhisperResultMatches(1eabb37) - added fallback logic to word alignment (1eabb37)
- added
find()toresult.WhisperResult(1eabb37) - added
suppress_ts_tokensandgap_paddingtotranscribe()and CLI (1eabb37) - added
shell=Truetois_ytdlp_available()(d2b7f3f) - fixed
NaNvalues in the logits (1eabb37)
- added
offset_time()toWhisperResult,Segment,WordTiming(1447a66) - added support for audio as URLs (1447a66)
- fixed
languagedetection for English models (1447a66)
- added
split_callback(44af5c4) - changed parameters of
split_callback(c003ce4) - corrected the docstring for
rtl(169e014) - fixed punctuation split/merge to work as intended (a84a346)
- added regrouping list (a0021bd)
- added
--max_charsand--max_wordsto CLI (f913d6f) - added
rtl#116 (f913d6f) - corrected VAD pytorch requirement (60f668d)
- fixed
visualize_suppression()error whenmax_width=-1(918e3ba) - fixed out of range error (918e3ba)
- added
merge_all_segments()toresult.WhisperResult(7c69535) - added
split_by_length()toresult.WhisperResult(7c69535)
- fixed transcription logic (d44d287)
- added Tips to
README.md(c21e198) - added new token splitting method (fa813fe)
- fixed #112(3985791)
- fixed #117 (3985791)
- added instructions for installing demucs via error (de3c812)
- added
encoding='utf-8'toread_me()insetup.py(ff34b27) - updated
README.md(dfb147e)
- added
mel_first(8fa5670) - fixed: to not apply
min_duron words if segments contains no words (8fa5670) - updated regroup demo video (e9932fe)
- fixed timestamps to jump backwards (26918d5)
- changed default
strip=Trueforresult_to_srt_vtt()(ce4c7b3) - keep segments when if segment has no words from the start (6ccfa17)
- improved
stabilization.audio2loudness()efficiency (db99d6b) - fixed
regroup=Truewhenword_timestamp=sFalse(6ccfa17) - fixed
word_level=Falsefailing output whenword_timestamps=False(ce4c7b3) - fixed ASS output formatting (ce4c7b3)
- updated
README.md(f9f7c51)
- added segment-level and word-level support to SRT/VTT/ASS outputs (2248087)
- added
result.WhisperResult(2248087) - added Silero VAD support (2248087)
- added
visualize_suppression()(2248087) - added regrouping methods (2248087)
- changed python requirement from 3.7+ to 3.8+ (2248087)
- improved non-vad suppression (2248087)
- improve word-level timestamps reliability (2248087)
- updated
README.md(eb5e68c)