Introducing AI Conversations: Natural Language Interaction for Your Apps! Learn More
Speech Releases
2.1 - May 21, 2026
Section titled “2.1 - May 21, 2026” Feature
ISpeechToTextProvider.Error event — providers can surface non-fatal errors (e.g. transient network failures between chunked requests in continuous mode) without aborting the RecognizeAsync enumerator. CloudSpeechToText subscribes and forwards to the service-level ISpeechToTextService.Error event automatically. Azure / OpenAI / ElevenLabs providers updated to use it instead of throwing out of the enumerator BREAKING Chore
ISpeechToTextProvider now requires implementers to expose an event EventHandler<SpeechRecognitionError>? Error; — existing custom providers must add the event declaration (it can be unraised for one-shot providers) Feature
ElevenLabs Scribe speech-to-text provider —
AddElevenLabsSpeech() now registers both STT and TTS, or use the new AddElevenLabsSpeechToText() helper. Buffers captured PCM, wraps in a WAV container, and posts a single request to /v1/speech-to-text; yields one final SpeechRecognitionResult per session Feature
ElevenLabsConfig.SpeechToTextModel property (default scribe_v1) — configurable Scribe model id BREAKING Chore
Renamed
ElevenLabsConfig.ModelId → ElevenLabsConfig.TextToSpeechModel to disambiguate from the new SpeechToTextModel property Fix
KeywordHeard no longer re-fires for the same final transcription within a 3-second window — eliminates duplicate keyword events caused by trailing-audio carry-over between recognition tasks (iOS SFSpeechRecognizer re-arm, Android SpeechRecognizer restart). Applied uniformly to Apple, Android, Browser, Windows, and CloudSpeechToText Feature
ITextToSpeechService.AudioLevelChanged event and IsPlayerAnalysisSupported flag — normalized 0.0–1.0 RMS level for driving VU-meter UI during speech playback Feature
IAudioPlayer.AudioLevelChanged event and IsPlayerAnalysisSupported flag — same RMS signal raised during generic audio playback (e.g. cloud TTS audio streams) Enhancement iOS
Apple native TTS now routes
AVSpeechSynthesizer through AVAudioEngine + AVAudioPlayerNode with a player-node tap, enabling AudioLevelChanged for built-in iOS / macOS / Mac Catalyst voices. Engine is created lazily on first speak and kept warm across utterances Enhancement Android
Android native TTS taps
UtteranceProgressListener.OnAudioAvailable to compute RMS from PCM bytes without rerouting playback Enhancement Android
AndroidAudioPlayer attaches Android.Media.Audiofx.Visualizer to the MediaPlayer audio session for cloud TTS / generic playback metering (no RECORD_AUDIO permission needed for per-session capture; MODIFY_AUDIO_SETTINGS recommended) Enhancement iOS
AppleAudioPlayer enables AVAudioPlayer.MeteringEnabled and polls AveragePower for VU metering during cloud / generic audio playback Chore
CloudTextToSpeech forwards AudioLevelChanged and IsPlayerAnalysisSupported from the underlying IAudioPlayer — Azure / OpenAI / ElevenLabs / custom providers get VU metering for free Enhancement iOS
CarPlay compatible — iOS audio session uses
PlayAndRecord with AllowBluetooth, AllowBluetoothA2dp, and DefaultToSpeaker so audio automatically routes through the car’s microphone and speakers when CarPlay is active2.0 - May 13, 2026
Section titled “2.0 - May 13, 2026” BREAKING Chore
ISpeechToTextService redesigned from IAsyncEnumerable-based to event-based Start/Stop model — ContinuousRecognize() and ListenUntilSilence() removed from the interface Feature
Start(SpeechRecognitionOptions?) / Stop() methods — long-lived listening sessions with explicit lifecycle control; Start() throws if already listening, Stop() is a safe no-op Feature
ResultReceived event — fires for every recognition result (partial and final) with full SpeechRecognitionResult including Text, IsFinal, and Confidence; multiple subscribers supported Feature
KeywordHeard event — fires when a keyword from SpeechRecognitionOptions.Keywords is detected in a final result using case-insensitive whole-word matching Feature
Error event — fires on recognition errors with SpeechRecognitionError containing Message and optional Exception Feature
SpeechRecognitionError record — new type for error reporting via the Error event Feature
SpeechRecognitionOptions.Keywords property (string[]?) — built-in keyword detection at the platform level; keywords are matched with compiled regex on final results BREAKING Chore
ListenWithWakeWord() extension method removed — replaced by StatementAfterKeyword() BREAKING Chore
ListenForKeyword() extension method removed — replaced by WaitListenForKeywords() and ListenForKeywords() Feature
ListenUntilSilence() extension method — starts listening, waits for first final result, then stops; replaces the former interface method Feature
StatementAfterKeyword(string[]) extension method — waits for a keyword to be heard, then returns the next final statement (replaces ListenWithWakeWord) Feature
WaitListenForKeywords(string[], TimeSpan?) extension method — returns the first keyword heard with optional timeout Feature
ListenForKeywords(string[]) extension method — yields keywords continuously as IAsyncEnumerable<string> Enhancement
All extension methods handle Start/Stop/event wiring automatically — no manual lifecycle management needed for simple scenarios
Enhancement
Multiple classes can now subscribe to speech recognition events simultaneously — eliminates the single-consumer limitation of
IAsyncEnumerable Enhancement
Cloud provider (
CloudSpeechToText) adapted to consume ISpeechToTextProvider.RecognizeAsync() internally on a background task and raise events — ISpeechToTextProvider interface unchanged1.2.1 - May 11, 2026
Section titled “1.2.1 - May 11, 2026” Feature
OpenAI cloud provider —
AddOpenAiSpeech() registers OpenAI STT (Whisper / GPT-4o Transcribe) and TTS (GPT-4o Mini TTS) with configurable model and voice selection Feature
OpenAI TTS supports 10 built-in voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer
Feature
OpenAI STT/TTS follows the same cloud provider pattern as Azure and ElevenLabs — platform
IAudioSource and IAudioPlayer handle audio I/O Feature
Microsoft.Extensions.AI adapter —
AddShinySpeechClients() exposes any registered cloud provider as ISpeechToTextClient and ITextToSpeechClient from Microsoft.Extensions.AI Feature
AddShinySpeechToTextClient() / AddShinyTextToSpeechClient() for registering M.E.AI adapters individually Feature
M.E.AI streaming support —
GetStreamingTextAsync() emits SessionOpen, TextUpdating, TextUpdated, SessionClose update kinds mapped from SpeechRecognitionResult.IsFinal Feature
M.E.AI TTS streaming —
GetStreamingAudioAsync() emits SessionOpen, AudioUpdated, SessionClose update kinds with audio as DataContent Feature
M.E.AI options mapping —
SpeechToTextOptions.SpeechLanguage → CultureInfo, TextToSpeechOptions voice/speed/pitch/volume mapped to Shiny equivalents1.2 - May 6, 2026
Section titled “1.2 - May 6, 2026” Feature WASM
Browser
IAudioSource implementation — raw PCM microphone capture via the Web Audio API (getUserMedia + ScriptProcessorNode), downsampled to 16kHz 16-bit mono, matching the output format of Android, iOS, and Windows Enhancement WASM
Cloud STT providers (Azure, custom) now work in the browser —
IAudioSource provides the raw audio stream that CloudSpeechToText requires1.1.2 - May 6, 2026
Section titled “1.1.2 - May 6, 2026” Enhancement
Cloud provider extensions (
AddAzureSpeech, AddElevenLabsTextToSpeech, AddCloudSpeechToText, AddCloudTextToSpeech) now automatically register IAudioSource and IAudioPlayer — manual AddAudioSource() / AddAudioPlayer() calls are no longer required1.1.1 - May 5, 2026
Section titled “1.1.1 - May 5, 2026” Feature
ISpeechToTextService.IsListening property — indicates whether speech recognition is currently active, analogous to ITextToSpeechService.IsSpeaking1.1 - May 4, 2026
Section titled “1.1 - May 4, 2026” Feature
ListenWithWakeWord() extension method — “Hey Siri” style wake word activation that continuously listens for a wake phrase, then captures the spoken command after it until silence Feature
ListenForKeyword() extension method — listens continuously until one of the specified keywords is detected (case-insensitive, whole-word matching), returns the matched keyword Feature
Wake word supports pause-then-speak — if the user says the wake phrase and pauses before speaking, the method waits for the next utterance as the command
Feature
Both methods are extension methods on
ISpeechToTextService composing over ContinuousRecognize — no platform-specific code changes required Feature
Sample apps updated with Wake Word and Keyword listening modes (MAUI + Blazor)
1.0 - May 2, 2026
Section titled “1.0 - May 2, 2026” Feature
ISpeechToTextService interface — platform-native speech recognition with permission management, continuous streaming, and listen-until-silence modes Feature
ITextToSpeechService interface — platform-native text-to-speech with voice selection, speech rate, pitch, and volume control Feature
IAudioSource interface — raw PCM audio capture from the device microphone (16kHz, 16-bit, mono) Feature
IAudioPlayer interface — MP3 audio stream playback with play/stop control Feature
SpeechRecognitionOptions — configurable culture, silence timeout, and on-device preference for STT Feature
TextToSpeechOptions — configurable culture, voice, speech rate, pitch, and volume for TTS Feature
ContinuousRecognize() — streaming recognition results via IAsyncEnumerable<SpeechRecognitionResult> with partial and final results Feature
ListenUntilSilence() — simple dictation mode that returns the final transcription after silence is detected Feature
GetVoicesAsync() — enumerate available TTS voices with optional culture filtering Feature
AddSpeechServices() — single extension method to register all core services (STT, TTS, AudioSource, AudioPlayer) Feature Android
Android STT implementation using
SpeechRecognizer with streaming partial results Feature Android
Android TTS implementation using
Android.Speech.Tts.TextToSpeech Feature Android
Android audio capture via
AudioRecord with 16kHz PCM streaming Feature Android
Android audio playback via
MediaPlayer Feature iOS
iOS STT implementation using
SFSpeechRecognizer with SFSpeechAudioBufferRecognitionRequest Feature iOS
iOS TTS implementation using
AVSpeechSynthesizer Feature iOS
iOS audio capture via
AVAudioEngine with PCM tap Feature iOS
iOS audio playback via
AVAudioPlayer Feature
Cloud provider abstraction —
ISpeechToTextProvider and ITextToSpeechProvider interfaces for pluggable cloud backends Feature
CloudSpeechToText and CloudTextToSpeech — bridge classes that combine platform audio with cloud provider APIs Feature
AddCloudSpeechToText<T>() and AddCloudTextToSpeech<T>() — generic DI registration for custom cloud providers Feature
Azure AI Speech provider —
AddAzureSpeech() registers Azure STT and/or TTS with subscription key and region Feature
Azure TTS with SSML prosody control — speech rate, pitch, and volume mapped to SSML elements
Feature
ElevenLabs TTS provider —
AddElevenLabsTextToSpeech() registers ElevenLabs cloud TTS with configurable voice and model Feature
PipeStream utility — thread-safe producer-consumer stream using System.IO.Pipelines for bridging audio capture with cloud providers Feature WASM
Browser/WebAssembly support — STT and TTS via Web Speech API, auto-detected at runtime via
OperatingSystem.IsBrowser() Feature WASM
Browser STT implementation using
SpeechRecognition API with streaming partial and final results Feature WASM
Browser TTS implementation using
SpeechSynthesis API with voice selection, rate, pitch, and volume control Feature WASM
Browser audio playback via HTML5
Audio element with base64 data URL conversion Feature
Blazor WebAssembly sample app demonstrating STT, TTS, and voice listing