Skip to content

Commit da2bdb9

Browse files
committed
Squashed commit of the following:
commit 34f2cba Author: Sam Sykes <sams@speechmatics.com> Date: Mon Jan 26 16:21:42 2026 +0000 Fix for short utterances when using ForceEndOfUtterance (#78) * track previous partials when checking new finals * check we are not already speaking! * EOU / FEOU testing * permit no punctuation * added test for feou * update existing FEOU test * updated test. * expanded samples * fix test set * refining the values * updated tests for FEOU * extra tests and split out FIXED and ADAPTIVE tests * support other endpoints * Adjust VAD timeout default from 0.18 to 0.22 for FEOU. * Support `is_eou` for final segment in an utterance. * remove FEOU tests * retain 0.18 as the VAD timeout commit fecea0e Author: Lorna Armstrong <lorna.armstrong@speechmatics.com> Date: Wed Jan 14 15:35:49 2026 +0000 Fix Scribe preset configuration (#77) commit 8825c42 Author: Sam Sykes <sams@speechmatics.com> Date: Mon Jan 12 14:19:02 2026 +0000 Voice SDK url parameter handling (#76) ## What's Changed? - better handling for `sm-app` and other URL parameters provided by the client. - ensure that URL parameters are parsed correctly. commit 81f093f Author: Sam Sykes <sams@speechmatics.com> Date: Thu Jan 8 01:51:53 2026 +0100 Fix to max delay mode and filter for final changes (#74) ## What's Changed? - Updated to max delay mode and filter for final updates. commit 7c88c25 Author: Sam Sykes <sams@speechmatics.com> Date: Tue Dec 30 14:44:44 2025 +0100 Updated integration examples. (#73) * Updated integration examples. Includes linting of the README. * TIP fix * Prettier override. commit 624f014 Author: Zultran <edgar.adamovics@speechmatics.com> Date: Tue Dec 30 12:20:58 2025 +0000 Adds comprehensive README documentation (#70) * Adds comprehensive README documentation Introduces a detailed README file to provide users with a comprehensive guide to the Speechmatics Python SDK. The README includes: - Quick start instructions for installation and basic usage - Information on key features, use cases, and integration examples - Documentation links and migration guides - Information about Speechmatics technology - Links to resources and community support * Removes bold formatting from migration guide links Updates the README to remove bold formatting from the "Full Migration Guides" section. This improves the visual consistency of the document and avoids unnecessary emphasis on the links. * Updates examples and adds env variable Refactors the examples in the README to use environment variables for the API key and includes an async close on the client in the batch example. Also adds prefer_current_speaker to the speaker diarization config example. * Updates README with usage examples and features Enhances the README with detailed examples for batch, realtime, TTS, and voice agent functionalities. Also, includes installation instructions, key features, and use cases for the Speechmatics Python SDK. * Fixed broken status page link to README * Enhances README with examples and details Updates the README to include more detailed examples for batch transcription, realtime streaming, text-to-speech, and voice agent functionalities. Adds sections on key features like speaker diarization, custom dictionaries, audio intelligence, and translation with corresponding code snippets. Provides information on framework integrations, focusing on LiveKit Agents and Pipecat AI, improving user understanding and adoption. commit cb48e21 Author: Sam Sykes <sams@speechmatics.com> Date: Mon Dec 22 10:45:11 2025 +0100 Reduce RT logging in Voice SDK (#72) ## What's Changed - Lowered logging of the RT AsyncClient to reduce debug noise - Bumped ORT / ONNX runtime dependency requirement commit 3a247b0 Author: Sam Sykes <sams@speechmatics.com> Date: Mon Dec 22 10:39:02 2025 +0100 Fix for when diarization is not enabled (#71) ## What's Changed - When diarization is not enabled, all speakers are identified as `UU`. commit 95ca9b6 Author: Sam Sykes <sams@speechmatics.com> Date: Wed Dec 17 09:48:32 2025 +0100 fix to use rt 0.5.3 (#69) commit cecb235 Author: Sam Sykes <sams@speechmatics.com> Date: Tue Dec 16 20:18:01 2025 +0100 fix to SSL for AsyncClient WebSocket (#68) Fix so `ws://` connections do not fail.
1 parent 1bee1a9 commit da2bdb9

File tree

4 files changed

+162
-37
lines changed

4 files changed

+162
-37
lines changed

sdk/voice/speechmatics/voice/_client.py

Lines changed: 67 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,10 @@
1414
from typing import Callable
1515
from typing import Optional
1616
from typing import Union
17+
from urllib.parse import parse_qs
1718
from urllib.parse import urlencode
19+
from urllib.parse import urlparse
20+
from urllib.parse import urlunparse
1821

1922
from speechmatics.rt import AsyncClient
2023
from speechmatics.rt import AudioEncoding
@@ -196,7 +199,6 @@ def __init__(
196199
# Change filter to emit segments
197200
self._change_filter: list[AnnotationFlags] = [
198201
AnnotationFlags.NEW,
199-
# AnnotationFlags.UPDATED_PARTIALS,
200202
AnnotationFlags.UPDATED_FINALS,
201203
]
202204

@@ -333,6 +335,7 @@ def __init__(
333335
self._session_speakers: dict[str, SessionSpeaker] = {}
334336
self._is_speaking: bool = False
335337
self._current_speaker: Optional[str] = None
338+
self._last_valid_partial_word_count: int = 0
336339
self._dz_enabled: bool = self._config.enable_diarization
337340
self._dz_config = self._config.speaker_config
338341
self._last_speak_start_time: Optional[float] = None
@@ -452,7 +455,7 @@ def _prepare_config(
452455
)
453456

454457
# Punctuation overrides
455-
if config.punctuation_overrides:
458+
if config.punctuation_overrides is not None:
456459
transcription_config.punctuation_overrides = config.punctuation_overrides
457460

458461
# Configure the audio
@@ -578,7 +581,7 @@ async def disconnect(self) -> None:
578581
self._closing_session = True
579582

580583
# Emit final segments
581-
await self._emit_segments(finalize=True)
584+
await self._emit_segments(finalize=True, is_eou=True)
582585

583586
# Emit final metrics
584587
self._emit_speaker_metrics()
@@ -745,7 +748,7 @@ async def emit() -> None:
745748
return
746749

747750
# Emit the segments
748-
self._stt_message_queue.put_nowait(lambda: self._emit_segments(finalize=True))
751+
self._stt_message_queue.put_nowait(lambda: self._emit_segments(finalize=True, is_eou=True))
749752

750753
# Call async task (only if not already waiting for forced EOU)
751754
if not (self._config.end_of_turn_config.use_forced_eou and self._forced_eou_active):
@@ -1120,8 +1123,7 @@ async def _add_speech_fragments(self, message: dict[str, Any], is_final: bool =
11201123
self._last_fragment_end_time = max(self._last_fragment_end_time, fragment.end_time)
11211124

11221125
# Evaluate for VAD (only done on partials)
1123-
if not is_final:
1124-
await self._vad_evaluation(fragments)
1126+
await self._vad_evaluation(fragments, is_final=is_final)
11251127

11261128
# Fragments to retain
11271129
retained_fragments = [
@@ -1232,7 +1234,7 @@ async def fn() -> None:
12321234
# Emit the segments
12331235
await self._emit_segments()
12341236

1235-
async def _emit_segments(self, finalize: bool = False) -> None:
1237+
async def _emit_segments(self, finalize: bool = False, is_eou: bool = False) -> None:
12361238
"""Emit segments to listeners.
12371239
12381240
This function will emit segments in the view without any further checks
@@ -1241,6 +1243,7 @@ async def _emit_segments(self, finalize: bool = False) -> None:
12411243
12421244
Args:
12431245
finalize: Whether to finalize all segments.
1246+
is_eou: Whether the segments are being emitted after an end of utterance.
12441247
"""
12451248

12461249
# Only process if we have segments in the buffer
@@ -1311,6 +1314,10 @@ async def _emit_segments(self, finalize: bool = False) -> None:
13111314
segment=last_segment,
13121315
)
13131316

1317+
# Mark the final segments as end of utterance
1318+
if is_eou:
1319+
final_segments[-1].is_eou = True
1320+
13141321
# Emit segments
13151322
self._emit_message(
13161323
SegmentMessage(
@@ -1323,6 +1330,7 @@ async def _emit_segments(self, finalize: bool = False) -> None:
13231330
language=s.language,
13241331
text=s.text,
13251332
annotation=s.annotation,
1333+
is_eou=s.is_eou,
13261334
fragments=(
13271335
[SegmentMessageSegmentFragment(**f.__dict__) for f in s.fragments]
13281336
if self._config.include_results
@@ -1696,52 +1704,71 @@ async def _await_forced_eou(self, timeout: float = 1.0) -> None:
16961704
# VAD (VOICE ACTIVITY DETECTION) / SPEAKER DETECTION
16971705
# ============================================================================
16981706

1699-
async def _vad_evaluation(self, fragments: list[SpeechFragment]) -> None:
1707+
async def _vad_evaluation(self, fragments: list[SpeechFragment], is_final: bool) -> None:
17001708
"""Emit a VAD event.
17011709
17021710
This will emit `SPEAKER_STARTED` and `SPEAKER_ENDED` events to the client and is
17031711
based on valid transcription for active speakers. Ignored or speakers not in
17041712
focus will not be considered an active participant.
17051713
1706-
This should only run on partial / non-final words.
1707-
17081714
Args:
17091715
fragments: The list of fragments to use for evaluation.
1716+
is_final: Whether the fragments are final.
17101717
"""
17111718

1712-
# Find the valid list of partial words
1719+
# Filter fragments for valid speakers, if required
17131720
if self._dz_enabled and self._dz_config.focus_speakers:
1714-
new_partials = [
1715-
frag
1716-
for frag in fragments
1717-
if frag.speaker in self._dz_config.focus_speakers and frag.type_ == "word" and not frag.is_final
1718-
]
1719-
else:
1720-
new_partials = [frag for frag in fragments if frag.type_ == "word" and not frag.is_final]
1721+
fragments = [f for f in fragments if f.speaker in self._dz_config.focus_speakers]
1722+
1723+
# Find partial and final words
1724+
words = [f for f in fragments if f.type_ == "word"]
1725+
1726+
# Check if we have any new words
1727+
has_words = len(words) > 0
1728+
1729+
# Handle finals
1730+
if is_final:
1731+
"""Check for finals without partials.
17211732
1722-
# Check if we have new partials
1723-
has_valid_partial = len(new_partials) > 0
1733+
When a forced end of utterance is used, the transcription may skip partials
1734+
and go straight to finals. In this case, we need to check if we had any partials
1735+
last time and if not, we need to assume we have a new speaker.
1736+
"""
1737+
1738+
# Check if transcript went straight to finals (typical with forced end of utterance)
1739+
if not self._is_speaking and has_words and self._last_valid_partial_word_count == 0:
1740+
# Track the current speaker
1741+
self._current_speaker = words[0].speaker
1742+
self._is_speaking = True
1743+
1744+
# Emit speaker started event
1745+
await self._handle_speaker_started(self._current_speaker, words[0].start_time)
1746+
1747+
# No further processing needed
1748+
return
1749+
1750+
# Track partial count
1751+
self._last_valid_partial_word_count = len(words)
17241752

17251753
# Current states
17261754
current_is_speaking = self._is_speaking
17271755
current_speaker = self._current_speaker
17281756

17291757
# Establish the speaker from latest partials
1730-
latest_speaker = new_partials[-1].speaker if has_valid_partial else current_speaker
1758+
latest_speaker = words[-1].speaker if has_words else current_speaker
17311759

17321760
# Determine if the speaker has changed (and we have a speaker)
17331761
speaker_changed = latest_speaker != current_speaker and current_speaker is not None
17341762

17351763
# Start / end times (earliest and latest)
1736-
speaker_start_time = new_partials[0].start_time if has_valid_partial else None
1764+
speaker_start_time = words[0].start_time if has_words else None
17371765
speaker_end_time = self._last_fragment_end_time
17381766

17391767
# If diarization is enabled, indicate speaker switching
17401768
if self._dz_enabled and latest_speaker is not None:
17411769
"""When enabled, we send a speech events if the speaker has changed.
17421770
1743-
This
1744-
will emit a SPEAKER_ENDED for the previous speaker and a SPEAKER_STARTED
1771+
This will emit a SPEAKER_ENDED for the previous speaker and a SPEAKER_STARTED
17451772
for the new speaker.
17461773
17471774
For any client that wishes to show _which_ speaker is speaking, this will
@@ -1772,7 +1799,7 @@ async def _vad_evaluation(self, fragments: list[SpeechFragment]) -> None:
17721799
self._current_speaker = latest_speaker
17731800

17741801
# No further processing if we have no new fragments and we are not speaking
1775-
if has_valid_partial == current_is_speaking:
1802+
if has_words == current_is_speaking:
17761803
return
17771804

17781805
# Update speaking state
@@ -1915,12 +1942,21 @@ def _get_endpoint_url(self, url: str, app: Optional[str] = None) -> str:
19151942
app: The application name to use in the endpoint URL.
19161943
19171944
Returns:
1918-
str: The formatted endpoint URL.
1945+
str: The formatted endpoint URL.
19191946
"""
19201947

1921-
query_params = {}
1922-
query_params["sm-app"] = app or f"voice-sdk/{__version__}"
1923-
query_params["sm-voice-sdk"] = f"{__version__}"
1924-
query = urlencode(query_params)
1948+
# Parse the URL to extract existing query parameters
1949+
parsed = urlparse(url)
1950+
1951+
# Extract existing params into a dict of lists, keeping params without values
1952+
params = parse_qs(parsed.query, keep_blank_values=True)
1953+
1954+
# Use the provided app name, or fallback to existing value, or use the default string
1955+
existing_app = params.get("sm-app", [None])[0]
1956+
app_name = app or existing_app or f"voice-sdk/{__version__}"
1957+
params["sm-app"] = [app_name]
1958+
params["sm-voice-sdk"] = [__version__]
19251959

1926-
return f"{url}?{query}"
1960+
# Re-encode the query string and reconstruct
1961+
updated_query = urlencode(params, doseq=True)
1962+
return urlunparse(parsed._replace(query=updated_query))

sdk/voice/speechmatics/voice/_models.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -940,6 +940,7 @@ class SpeakerSegment(BaseModel):
940940
fragments: The list of SpeechFragment items.
941941
text: The text of the segment.
942942
annotation: The annotation associated with the segment.
943+
is_eou: Whether the fragment is the end of an utterance. Defaults to `False`.
943944
"""
944945

945946
speaker_id: Optional[str] = None
@@ -949,6 +950,7 @@ class SpeakerSegment(BaseModel):
949950
fragments: list[SpeechFragment] = Field(default_factory=list)
950951
text: Optional[str] = None
951952
annotation: AnnotationResult = Field(default_factory=AnnotationResult)
953+
is_eou: bool = False
952954

953955
model_config = ConfigDict(use_enum_values=True, arbitrary_types_allowed=True)
954956

@@ -1313,6 +1315,8 @@ class SegmentMessageSegment(BaseModel):
13131315
language: The language of the frame.
13141316
text: The text of the segment.
13151317
fragments: The fragments associated with the segment.
1318+
annotation: The annotation associated with the segment (optional).
1319+
is_eou: Whether the segment is an end of utterance.
13161320
metadata: The metadata associated with the segment.
13171321
"""
13181322

@@ -1323,6 +1327,7 @@ class SegmentMessageSegment(BaseModel):
13231327
text: Optional[str] = None
13241328
fragments: Optional[list[SegmentMessageSegmentFragment]] = None
13251329
annotation: list[AnnotationFlags] = Field(default_factory=list, exclude=False)
1330+
is_eou: bool = False
13261331
metadata: MessageTimeMetadata
13271332

13281333
model_config = ConfigDict(extra="ignore")

sdk/voice/speechmatics/voice/_presets.py

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -135,13 +135,8 @@ def SCRIBE(overlay: Optional[VoiceAgentConfig] = None) -> VoiceAgentConfig: # n
135135
enable_diarization=True,
136136
max_delay=2.0,
137137
end_of_utterance_silence_trigger=1.0,
138-
end_of_utterance_mode=EndOfUtteranceMode.ADAPTIVE,
138+
end_of_utterance_mode=EndOfUtteranceMode.FIXED,
139139
speech_segment_config=SpeechSegmentConfig(emit_sentences=True),
140-
smart_turn_config=SmartTurnConfig(
141-
enabled=True,
142-
),
143-
vad_config=VoiceActivityConfig(enabled=True, silence_duration=0.2),
144-
end_of_turn_config=EndOfTurnConfig(use_forced_eou=True),
145140
),
146141
overlay,
147142
)

tests/voice/test_16_url.py

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
from dataclasses import dataclass
2+
from typing import Optional
3+
from urllib.parse import parse_qs
4+
from urllib.parse import urlparse
5+
6+
import pytest
7+
from _utils import get_client
8+
9+
from speechmatics.voice import __version__
10+
11+
12+
@dataclass
13+
class URLExample:
14+
input_url: str
15+
input_app: Optional[str] = None
16+
17+
18+
URLS: list[URLExample] = [
19+
URLExample(
20+
input_url="wss://dummy/ep",
21+
input_app="dummy-0.1.2",
22+
),
23+
URLExample(
24+
input_url="wss://dummy:1234/ep?client=amz",
25+
input_app="dummy-0.1.2",
26+
),
27+
URLExample(
28+
input_url="wss://dummy/ep?sm-app=dummy",
29+
),
30+
URLExample(
31+
input_url="ws://localhost:8080/ep?sm-app=dummy",
32+
input_app="dummy-0.1.2",
33+
),
34+
URLExample(
35+
input_url="http://dummy/ep/v1/",
36+
input_app="dummy-0.1.2",
37+
),
38+
URLExample(
39+
input_url="wss://dummy/ep",
40+
),
41+
URLExample(
42+
input_url="wss://dummy/ep",
43+
input_app="client/a#b:c^d",
44+
),
45+
]
46+
47+
48+
@pytest.mark.asyncio
49+
@pytest.mark.parametrize("test", URLS, ids=lambda s: s.input_url)
50+
async def test_url_endpoints(test: URLExample):
51+
"""Test URL endpoint construction."""
52+
53+
# Client
54+
client = await get_client(
55+
api_key="DUMMY",
56+
connect=False,
57+
)
58+
59+
# Parse the input parameters
60+
input_parsed = urlparse(test.input_url)
61+
input_params = parse_qs(input_parsed.query, keep_blank_values=True)
62+
63+
# URL test
64+
generated_url = client._get_endpoint_url(test.input_url, test.input_app)
65+
66+
# Parse the URL
67+
parsed_url = urlparse(generated_url)
68+
parsed_params = parse_qs(parsed_url.query, keep_blank_values=True)
69+
70+
# Check the url scheme, netloc and path are preserved
71+
assert parsed_url.scheme == input_parsed.scheme
72+
assert parsed_url.netloc == input_parsed.netloc
73+
assert parsed_url.path == input_parsed.path
74+
75+
# Validate `sm-app`
76+
if test.input_app:
77+
assert parsed_params["sm-app"] == [test.input_app]
78+
elif "sm-app" in input_params:
79+
assert parsed_params["sm-app"] == [input_params["sm-app"][0]]
80+
else:
81+
assert parsed_params["sm-app"] == [f"voice-sdk/{__version__}"]
82+
83+
# Validate `sm-voice-sdk`
84+
assert parsed_params["sm-voice-sdk"] == [__version__]
85+
86+
# Check other original params are preserved
87+
for key, value in input_params.items():
88+
if key not in ["sm-app", "sm-voice-sdk"]:
89+
assert parsed_params[key] == value

0 commit comments

Comments
 (0)