Squashed commit of the following:

sam-s10s · sam-s10s · commit da2bdb977a4e · 2026-01-26T16:23:43.000Z
commit 34f2cba Author: Sam Sykes <sams@speechmatics.com> Date: Mon Jan 26 16:21:42 2026 +0000 Fix for short utterances when using ForceEndOfUtterance (#78) * track previous partials when checking new finals * check we are not already speaking! * EOU / FEOU testing * permit no punctuation * added test for feou * update existing FEOU test * updated test. * expanded samples * fix test set * refining the values * updated tests for FEOU * extra tests and split out FIXED and ADAPTIVE tests * support other endpoints * Adjust VAD timeout default from 0.18 to 0.22 for FEOU. * Support `is_eou` for final segment in an utterance. * remove FEOU tests * retain 0.18 as the VAD timeout commit fecea0e Author: Lorna Armstrong <lorna.armstrong@speechmatics.com> Date: Wed Jan 14 15:35:49 2026 +0000 Fix Scribe preset configuration (#77) commit 8825c42 Author: Sam Sykes <sams@speechmatics.com> Date: Mon Jan 12 14:19:02 2026 +0000 Voice SDK url parameter handling (#76) ## What's Changed? - better handling for `sm-app` and other URL parameters provided by the client. - ensure that URL parameters are parsed correctly. commit 81f093f Author: Sam Sykes <sams@speechmatics.com> Date: Thu Jan 8 01:51:53 2026 +0100 Fix to max delay mode and filter for final changes (#74) ## What's Changed? - Updated to max delay mode and filter for final updates. commit 7c88c25 Author: Sam Sykes <sams@speechmatics.com> Date: Tue Dec 30 14:44:44 2025 +0100 Updated integration examples. (#73) * Updated integration examples. Includes linting of the README. * TIP fix * Prettier override. commit 624f014 Author: Zultran <edgar.adamovics@speechmatics.com> Date: Tue Dec 30 12:20:58 2025 +0000 Adds comprehensive README documentation (#70) * Adds comprehensive README documentation Introduces a detailed README file to provide users with a comprehensive guide to the Speechmatics Python SDK. The README includes: - Quick start instructions for installation and basic usage - Information on key features, use cases, and integration examples - Documentation links and migration guides - Information about Speechmatics technology - Links to resources and community support * Removes bold formatting from migration guide links Updates the README to remove bold formatting from the "Full Migration Guides" section. This improves the visual consistency of the document and avoids unnecessary emphasis on the links. * Updates examples and adds env variable Refactors the examples in the README to use environment variables for the API key and includes an async close on the client in the batch example. Also adds prefer_current_speaker to the speaker diarization config example. * Updates README with usage examples and features Enhances the README with detailed examples for batch, realtime, TTS, and voice agent functionalities. Also, includes installation instructions, key features, and use cases for the Speechmatics Python SDK. * Fixed broken status page link to README * Enhances README with examples and details Updates the README to include more detailed examples for batch transcription, realtime streaming, text-to-speech, and voice agent functionalities. Adds sections on key features like speaker diarization, custom dictionaries, audio intelligence, and translation with corresponding code snippets. Provides information on framework integrations, focusing on LiveKit Agents and Pipecat AI, improving user understanding and adoption. commit cb48e21 Author: Sam Sykes <sams@speechmatics.com> Date: Mon Dec 22 10:45:11 2025 +0100 Reduce RT logging in Voice SDK (#72) ## What's Changed - Lowered logging of the RT AsyncClient to reduce debug noise - Bumped ORT / ONNX runtime dependency requirement commit 3a247b0 Author: Sam Sykes <sams@speechmatics.com> Date: Mon Dec 22 10:39:02 2025 +0100 Fix for when diarization is not enabled (#71) ## What's Changed - When diarization is not enabled, all speakers are identified as `UU`. commit 95ca9b6 Author: Sam Sykes <sams@speechmatics.com> Date: Wed Dec 17 09:48:32 2025 +0100 fix to use rt 0.5.3 (#69) commit cecb235 Author: Sam Sykes <sams@speechmatics.com> Date: Tue Dec 16 20:18:01 2025 +0100 fix to SSL for AsyncClient WebSocket (#68) Fix so `ws://` connections do not fail.
diff --git a/sdk/voice/speechmatics/voice/_client.py b/sdk/voice/speechmatics/voice/_client.py
@@ -14,7 +14,10 @@
 from typing import Callable
 from typing import Optional
 from typing import Union
+from urllib.parse import parse_qs
 from urllib.parse import urlencode
+from urllib.parse import urlparse
+from urllib.parse import urlunparse
 
 from speechmatics.rt import AsyncClient
 from speechmatics.rt import AudioEncoding
@@ -196,7 +199,6 @@ def __init__(
         # Change filter to emit segments
         self._change_filter: list[AnnotationFlags] = [
             AnnotationFlags.NEW,
-            # AnnotationFlags.UPDATED_PARTIALS,
             AnnotationFlags.UPDATED_FINALS,
         ]
 
@@ -333,6 +335,7 @@ def __init__(
         self._session_speakers: dict[str, SessionSpeaker] = {}
         self._is_speaking: bool = False
         self._current_speaker: Optional[str] = None
+        self._last_valid_partial_word_count: int = 0
         self._dz_enabled: bool = self._config.enable_diarization
         self._dz_config = self._config.speaker_config
         self._last_speak_start_time: Optional[float] = None
@@ -452,7 +455,7 @@ def _prepare_config(
             )
 
         # Punctuation overrides
-        if config.punctuation_overrides:
+        if config.punctuation_overrides is not None:
             transcription_config.punctuation_overrides = config.punctuation_overrides
 
         # Configure the audio
@@ -578,7 +581,7 @@ async def disconnect(self) -> None:
         self._closing_session = True
 
         # Emit final segments
-        await self._emit_segments(finalize=True)
+        await self._emit_segments(finalize=True, is_eou=True)
 
         # Emit final metrics
         self._emit_speaker_metrics()
@@ -745,7 +748,7 @@ async def emit() -> None:
                 return
 
             # Emit the segments
-            self._stt_message_queue.put_nowait(lambda: self._emit_segments(finalize=True))
+            self._stt_message_queue.put_nowait(lambda: self._emit_segments(finalize=True, is_eou=True))
 
         # Call async task (only if not already waiting for forced EOU)
         if not (self._config.end_of_turn_config.use_forced_eou and self._forced_eou_active):
@@ -1120,8 +1123,7 @@ async def _add_speech_fragments(self, message: dict[str, Any], is_final: bool =
                     self._last_fragment_end_time = max(self._last_fragment_end_time, fragment.end_time)
 
             # Evaluate for VAD (only done on partials)
-            if not is_final:
-                await self._vad_evaluation(fragments)
+            await self._vad_evaluation(fragments, is_final=is_final)
 
             # Fragments to retain
             retained_fragments = [
@@ -1232,7 +1234,7 @@ async def fn() -> None:
         # Emit the segments
         await self._emit_segments()
 
-    async def _emit_segments(self, finalize: bool = False) -> None:
+    async def _emit_segments(self, finalize: bool = False, is_eou: bool = False) -> None:
         """Emit segments to listeners.
 
         This function will emit segments in the view without any further checks
@@ -1241,6 +1243,7 @@ async def _emit_segments(self, finalize: bool = False) -> None:
 
         Args:
             finalize: Whether to finalize all segments.
+            is_eou: Whether the segments are being emitted after an end of utterance.
         """
 
         # Only process if we have segments in the buffer
@@ -1311,6 +1314,10 @@ async def _emit_segments(self, finalize: bool = False) -> None:
                             segment=last_segment,
                         )
 
+                # Mark the final segments as end of utterance
+                if is_eou:
+                    final_segments[-1].is_eou = True
+
                 # Emit segments
                 self._emit_message(
                     SegmentMessage(
@@ -1323,6 +1330,7 @@ async def _emit_segments(self, finalize: bool = False) -> None:
                                 language=s.language,
                                 text=s.text,
                                 annotation=s.annotation,
+                                is_eou=s.is_eou,
                                 fragments=(
                                     [SegmentMessageSegmentFragment(**f.__dict__) for f in s.fragments]
                                     if self._config.include_results
@@ -1696,52 +1704,71 @@ async def _await_forced_eou(self, timeout: float = 1.0) -> None:
     # VAD (VOICE ACTIVITY DETECTION) / SPEAKER DETECTION
     # ============================================================================
 
-    async def _vad_evaluation(self, fragments: list[SpeechFragment]) -> None:
+    async def _vad_evaluation(self, fragments: list[SpeechFragment], is_final: bool) -> None:
         """Emit a VAD event.
 
         This will emit `SPEAKER_STARTED` and `SPEAKER_ENDED` events to the client and is
         based on valid transcription for active speakers. Ignored or speakers not in
         focus will not be considered an active participant.
 
-        This should only run on partial / non-final words.
-
         Args:
             fragments: The list of fragments to use for evaluation.
+            is_final: Whether the fragments are final.
         """
 
-        # Find the valid list of partial words
+        # Filter fragments for valid speakers, if required
         if self._dz_enabled and self._dz_config.focus_speakers:
-            new_partials = [
-                frag
-                for frag in fragments
-                if frag.speaker in self._dz_config.focus_speakers and frag.type_ == "word" and not frag.is_final
-            ]
-        else:
-            new_partials = [frag for frag in fragments if frag.type_ == "word" and not frag.is_final]
+            fragments = [f for f in fragments if f.speaker in self._dz_config.focus_speakers]
+
+        # Find partial and final words
+        words = [f for f in fragments if f.type_ == "word"]
+
+        # Check if we have any new words
+        has_words = len(words) > 0
+
+        # Handle finals
+        if is_final:
+            """Check for finals without partials.
 
-        # Check if we have new partials
-        has_valid_partial = len(new_partials) > 0
+            When a forced end of utterance is used, the transcription may skip partials
+            and go straight to finals. In this case, we need to check if we had any partials
+            last time and if not, we need to assume we have a new speaker.
+            """
+
+            # Check if transcript went straight to finals (typical with forced end of utterance)
+            if not self._is_speaking and has_words and self._last_valid_partial_word_count == 0:
+                # Track the current speaker
+                self._current_speaker = words[0].speaker
+                self._is_speaking = True
+
+                # Emit speaker started event
+                await self._handle_speaker_started(self._current_speaker, words[0].start_time)
+
+            # No further processing needed
+            return
+
+        # Track partial count
+        self._last_valid_partial_word_count = len(words)
 
         # Current states
         current_is_speaking = self._is_speaking
         current_speaker = self._current_speaker
 
         # Establish the speaker from latest partials
-        latest_speaker = new_partials[-1].speaker if has_valid_partial else current_speaker
+        latest_speaker = words[-1].speaker if has_words else current_speaker
 
         # Determine if the speaker has changed (and we have a speaker)
         speaker_changed = latest_speaker != current_speaker and current_speaker is not None
 
         # Start / end times (earliest and latest)
-        speaker_start_time = new_partials[0].start_time if has_valid_partial else None
+        speaker_start_time = words[0].start_time if has_words else None
         speaker_end_time = self._last_fragment_end_time
 
         # If diarization is enabled, indicate speaker switching
         if self._dz_enabled and latest_speaker is not None:
             """When enabled, we send a speech events if the speaker has changed.
 
-            This
-            will emit a SPEAKER_ENDED for the previous speaker and a SPEAKER_STARTED
+            This will emit a SPEAKER_ENDED for the previous speaker and a SPEAKER_STARTED
             for the new speaker.
 
             For any client that wishes to show _which_ speaker is speaking, this will
@@ -1772,7 +1799,7 @@ async def _vad_evaluation(self, fragments: list[SpeechFragment]) -> None:
         self._current_speaker = latest_speaker
 
         # No further processing if we have no new fragments and we are not speaking
-        if has_valid_partial == current_is_speaking:
+        if has_words == current_is_speaking:
             return
 
         # Update speaking state
@@ -1915,12 +1942,21 @@ def _get_endpoint_url(self, url: str, app: Optional[str] = None) -> str:
             app: The application name to use in the endpoint URL.
 
         Returns:
-        str: The formatted endpoint URL.
+            str: The formatted endpoint URL.
         """
 
-        query_params = {}
-        query_params["sm-app"] = app or f"voice-sdk/{__version__}"
-        query_params["sm-voice-sdk"] = f"{__version__}"
-        query = urlencode(query_params)
+        # Parse the URL to extract existing query parameters
+        parsed = urlparse(url)
+
+        # Extract existing params into a dict of lists, keeping params without values
+        params = parse_qs(parsed.query, keep_blank_values=True)
+
+        # Use the provided app name, or fallback to existing value, or use the default string
+        existing_app = params.get("sm-app", [None])[0]
+        app_name = app or existing_app or f"voice-sdk/{__version__}"
+        params["sm-app"] = [app_name]
+        params["sm-voice-sdk"] = [__version__]
 
-        return f"{url}?{query}"
+        # Re-encode the query string and reconstruct
+        updated_query = urlencode(params, doseq=True)
+        return urlunparse(parsed._replace(query=updated_query))
diff --git a/sdk/voice/speechmatics/voice/_models.py b/sdk/voice/speechmatics/voice/_models.py
@@ -940,6 +940,7 @@ class SpeakerSegment(BaseModel):
         fragments: The list of SpeechFragment items.
         text: The text of the segment.
         annotation: The annotation associated with the segment.
+        is_eou: Whether the fragment is the end of an utterance. Defaults to `False`.
     """
 
     speaker_id: Optional[str] = None
@@ -949,6 +950,7 @@ class SpeakerSegment(BaseModel):
     fragments: list[SpeechFragment] = Field(default_factory=list)
     text: Optional[str] = None
     annotation: AnnotationResult = Field(default_factory=AnnotationResult)
+    is_eou: bool = False
 
     model_config = ConfigDict(use_enum_values=True, arbitrary_types_allowed=True)
 
@@ -1313,6 +1315,8 @@ class SegmentMessageSegment(BaseModel):
         language: The language of the frame.
         text: The text of the segment.
         fragments: The fragments associated with the segment.
+        annotation: The annotation associated with the segment (optional).
+        is_eou: Whether the segment is an end of utterance.
         metadata: The metadata associated with the segment.
     """
 
@@ -1323,6 +1327,7 @@ class SegmentMessageSegment(BaseModel):
     text: Optional[str] = None
     fragments: Optional[list[SegmentMessageSegmentFragment]] = None
     annotation: list[AnnotationFlags] = Field(default_factory=list, exclude=False)
+    is_eou: bool = False
     metadata: MessageTimeMetadata
 
     model_config = ConfigDict(extra="ignore")
diff --git a/sdk/voice/speechmatics/voice/_presets.py b/sdk/voice/speechmatics/voice/_presets.py
@@ -135,13 +135,8 @@ def SCRIBE(overlay: Optional[VoiceAgentConfig] = None) -> VoiceAgentConfig:  # n
                 enable_diarization=True,
                 max_delay=2.0,
                 end_of_utterance_silence_trigger=1.0,
-                end_of_utterance_mode=EndOfUtteranceMode.ADAPTIVE,
+                end_of_utterance_mode=EndOfUtteranceMode.FIXED,
                 speech_segment_config=SpeechSegmentConfig(emit_sentences=True),
-                smart_turn_config=SmartTurnConfig(
-                    enabled=True,
-                ),
-                vad_config=VoiceActivityConfig(enabled=True, silence_duration=0.2),
-                end_of_turn_config=EndOfTurnConfig(use_forced_eou=True),
             ),
             overlay,
         )
diff --git a/tests/voice/test_16_url.py b/tests/voice/test_16_url.py
@@ -0,0 +1,89 @@
+from dataclasses import dataclass
+from typing import Optional
+from urllib.parse import parse_qs
+from urllib.parse import urlparse
+
+import pytest
+from _utils import get_client
+
+from speechmatics.voice import __version__
+
+
+@dataclass
+class URLExample:
+    input_url: str
+    input_app: Optional[str] = None
+
+
+URLS: list[URLExample] = [
+    URLExample(
+        input_url="wss://dummy/ep",
+        input_app="dummy-0.1.2",
+    ),
+    URLExample(
+        input_url="wss://dummy:1234/ep?client=amz",
+        input_app="dummy-0.1.2",
+    ),
+    URLExample(
+        input_url="wss://dummy/ep?sm-app=dummy",
+    ),
+    URLExample(
+        input_url="ws://localhost:8080/ep?sm-app=dummy",
+        input_app="dummy-0.1.2",
+    ),
+    URLExample(
+        input_url="http://dummy/ep/v1/",
+        input_app="dummy-0.1.2",
+    ),
+    URLExample(
+        input_url="wss://dummy/ep",
+    ),
+    URLExample(
+        input_url="wss://dummy/ep",
+        input_app="client/a#b:c^d",
+    ),
+]
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize("test", URLS, ids=lambda s: s.input_url)
+async def test_url_endpoints(test: URLExample):
+    """Test URL endpoint construction."""
+
+    # Client
+    client = await get_client(
+        api_key="DUMMY",
+        connect=False,
+    )
+
+    # Parse the input parameters
+    input_parsed = urlparse(test.input_url)
+    input_params = parse_qs(input_parsed.query, keep_blank_values=True)
+
+    # URL test
+    generated_url = client._get_endpoint_url(test.input_url, test.input_app)
+
+    # Parse the URL
+    parsed_url = urlparse(generated_url)
+    parsed_params = parse_qs(parsed_url.query, keep_blank_values=True)
+
+    # Check the url scheme, netloc and path are preserved
+    assert parsed_url.scheme == input_parsed.scheme
+    assert parsed_url.netloc == input_parsed.netloc
+    assert parsed_url.path == input_parsed.path
+
+    # Validate `sm-app`
+    if test.input_app:
+        assert parsed_params["sm-app"] == [test.input_app]
+    elif "sm-app" in input_params:
+        assert parsed_params["sm-app"] == [input_params["sm-app"][0]]
+    else:
+        assert parsed_params["sm-app"] == [f"voice-sdk/{__version__}"]
+
+    # Validate `sm-voice-sdk`
+    assert parsed_params["sm-voice-sdk"] == [__version__]
+
+    # Check other original params are preserved
+    for key, value in input_params.items():
+        if key not in ["sm-app", "sm-voice-sdk"]:
+            assert parsed_params[key] == value