- Custom Component replacement for Home Assistant
This custom component replaces the official Microsoft TTS integration for Home Assistant, which has not been updated or maintained for a long time and is now legacy.
This integration provides advanced SSML support for Azure TTS, including multi-voice scenarios and streaming-safe handling.
- Full
<speak>documents are supported in raw mode (raw_ssml: true) - Common SSML tags are handled robustly:
<voice>,<break>,<p>,<s>,<phoneme>,<sub> - Automatic light sanitization is applied for common malformed input cases
- If SSML is invalid, fallback logic prevents reading XML tags out loud
Is it "full SSML"?
Not a full implementation of every possible SSML feature/edge case from the entire W3C spec.
It is full for the main Azure TTS usage patterns and the most common Home Assistant automation scenarios.
If you already have the original Microsoft TTS integration configured via configuration.yaml, you must remove that configuration.
IMPORTANT: Before removing it, save:
- Your API key
- The server region (e.g.,
westeurope,eastus, etc.)
Remove from your configuration.yaml lines similar to these:
tts:
- platform: microsoft
api_key: YOUR_API_KEY
region: YOUR_REGIONClick this badge to install Microsoft Text-to-Speech (TTS) via HACS
Manual
Copy the custom_components folder to your Home Assistant configuration directory (where the configuration.yaml file is located).
The final structure should be:
config/
├── custom_components/
│ └── microsoft/
│ ├── __init__.py
│ ├── config_flow.py
│ ├── const.py
│ ├── manifest.json
│ └── tts.py
│ └── ssml_utils.py
└── configuration.yaml
Restart Home Assistant completely to load the new custom component.
Click this badge after restart Home Assistant to configure Microsoft Text-to-Speech (TTS)
Manual
- Go to Settings → Devices & Services → Integrations
- Click the + Add Integration button
- Search for "Microsoft Text-to-Speech (TTS)"
- Follow the guided configuration process
- Enter the API key and server region that you saved previously
This integration now supports streaming text-to-speech for reduced latency in voice assistant pipelines. When used with LLM conversation agents:
- Sentence-by-sentence synthesis: Audio is generated and played as soon as each sentence is complete, rather than waiting for the entire response
- 50-70% latency reduction: Users hear the first sentence while the LLM is still generating subsequent text
- Multi-language support: Intelligent sentence detection for 140+ languages including:
- Latin scripts (English, Italian, Spanish, etc.)
- CJK languages (Chinese, Japanese, Korean)
- Arabic and Urdu
- Indic scripts (Hindi, Bengali, Marathi, etc.)
- Full SSML support: Maintains all voice customization options (voice, rate, pitch, volume, style, role) in streaming mode
- SSML sanitization: now with full handling of special characters.
The streaming implementation uses the async_stream_tts_audio method introduced in Home Assistant's TTS architecture:
- Text accumulation: Incoming text chunks from the LLM are accumulated until a sentence boundary is detected
- Sentence synthesis: Each complete sentence is synthesized independently using Azure TTS REST API
- Audio streaming: Audio chunks are streamed to Home Assistant as they arrive from Azure
- Immediate playback: Home Assistant begins playback without waiting for the complete response
Note: Streaming requires Home Assistant 2024.2+.
- Home Assistant version 2024.2+ or higher
- Azure Cognitive Services Speech API key
Developed by @pajeronda
Integration based on:
This project is released under the GPL-3.0 License. See LICENSE for details.
-
API Usage: This integration requires an active Microsoft Azure account and a valid API key. Use of the Azure Cognitive Services API is subject to Microsoft's terms of service.
-
Trademarks: Microsoft and related logos are registered trademarks of Microsoft Corp. This project is an unofficial integration developed by @pajeronda and is not affiliated with, sponsored by, or endorsed by Microsoft Corp.