Skip to content
Introducing AI Conversations: Natural Language Interaction for Your Apps! Learn More

Speech Releases

Feature
ISpeechToTextProvider.Error event — providers can surface non-fatal errors (e.g. transient network failures between chunked requests in continuous mode) without aborting the RecognizeAsync enumerator. CloudSpeechToText subscribes and forwards to the service-level ISpeechToTextService.Error event automatically. Azure / OpenAI / ElevenLabs providers updated to use it instead of throwing out of the enumerator
BREAKING Chore
ISpeechToTextProvider now requires implementers to expose an event EventHandler<SpeechRecognitionError>? Error; — existing custom providers must add the event declaration (it can be unraised for one-shot providers)
Feature
ElevenLabs Scribe speech-to-text provider — AddElevenLabsSpeech() now registers both STT and TTS, or use the new AddElevenLabsSpeechToText() helper. Buffers captured PCM, wraps in a WAV container, and posts a single request to /v1/speech-to-text; yields one final SpeechRecognitionResult per session
Feature
ElevenLabsConfig.SpeechToTextModel property (default scribe_v1) — configurable Scribe model id
BREAKING Chore
Renamed ElevenLabsConfig.ModelIdElevenLabsConfig.TextToSpeechModel to disambiguate from the new SpeechToTextModel property
Fix
KeywordHeard no longer re-fires for the same final transcription within a 3-second window — eliminates duplicate keyword events caused by trailing-audio carry-over between recognition tasks (iOS SFSpeechRecognizer re-arm, Android SpeechRecognizer restart). Applied uniformly to Apple, Android, Browser, Windows, and CloudSpeechToText
Feature
ITextToSpeechService.AudioLevelChanged event and IsPlayerAnalysisSupported flag — normalized 0.0–1.0 RMS level for driving VU-meter UI during speech playback
Feature
IAudioPlayer.AudioLevelChanged event and IsPlayerAnalysisSupported flag — same RMS signal raised during generic audio playback (e.g. cloud TTS audio streams)
Enhancement iOS
Apple native TTS now routes AVSpeechSynthesizer through AVAudioEngine + AVAudioPlayerNode with a player-node tap, enabling AudioLevelChanged for built-in iOS / macOS / Mac Catalyst voices. Engine is created lazily on first speak and kept warm across utterances
Enhancement Android
Android native TTS taps UtteranceProgressListener.OnAudioAvailable to compute RMS from PCM bytes without rerouting playback
Enhancement Android
AndroidAudioPlayer attaches Android.Media.Audiofx.Visualizer to the MediaPlayer audio session for cloud TTS / generic playback metering (no RECORD_AUDIO permission needed for per-session capture; MODIFY_AUDIO_SETTINGS recommended)
Enhancement iOS
AppleAudioPlayer enables AVAudioPlayer.MeteringEnabled and polls AveragePower for VU metering during cloud / generic audio playback
Chore
CloudTextToSpeech forwards AudioLevelChanged and IsPlayerAnalysisSupported from the underlying IAudioPlayer — Azure / OpenAI / ElevenLabs / custom providers get VU metering for free
Enhancement iOS
CarPlay compatible — iOS audio session uses PlayAndRecord with AllowBluetooth, AllowBluetoothA2dp, and DefaultToSpeaker so audio automatically routes through the car’s microphone and speakers when CarPlay is active
BREAKING Chore
ISpeechToTextService redesigned from IAsyncEnumerable-based to event-based Start/Stop model — ContinuousRecognize() and ListenUntilSilence() removed from the interface
Feature
Start(SpeechRecognitionOptions?) / Stop() methods — long-lived listening sessions with explicit lifecycle control; Start() throws if already listening, Stop() is a safe no-op
Feature
ResultReceived event — fires for every recognition result (partial and final) with full SpeechRecognitionResult including Text, IsFinal, and Confidence; multiple subscribers supported
Feature
KeywordHeard event — fires when a keyword from SpeechRecognitionOptions.Keywords is detected in a final result using case-insensitive whole-word matching
Feature
Error event — fires on recognition errors with SpeechRecognitionError containing Message and optional Exception
Feature
SpeechRecognitionError record — new type for error reporting via the Error event
Feature
SpeechRecognitionOptions.Keywords property (string[]?) — built-in keyword detection at the platform level; keywords are matched with compiled regex on final results
BREAKING Chore
ListenWithWakeWord() extension method removed — replaced by StatementAfterKeyword()
BREAKING Chore
ListenForKeyword() extension method removed — replaced by WaitListenForKeywords() and ListenForKeywords()
Feature
ListenUntilSilence() extension method — starts listening, waits for first final result, then stops; replaces the former interface method
Feature
StatementAfterKeyword(string[]) extension method — waits for a keyword to be heard, then returns the next final statement (replaces ListenWithWakeWord)
Feature
WaitListenForKeywords(string[], TimeSpan?) extension method — returns the first keyword heard with optional timeout
Feature
ListenForKeywords(string[]) extension method — yields keywords continuously as IAsyncEnumerable&lt;string&gt;
Enhancement
All extension methods handle Start/Stop/event wiring automatically — no manual lifecycle management needed for simple scenarios
Enhancement
Multiple classes can now subscribe to speech recognition events simultaneously — eliminates the single-consumer limitation of IAsyncEnumerable
Enhancement
Cloud provider (CloudSpeechToText) adapted to consume ISpeechToTextProvider.RecognizeAsync() internally on a background task and raise events — ISpeechToTextProvider interface unchanged
Feature
OpenAI cloud provider — AddOpenAiSpeech() registers OpenAI STT (Whisper / GPT-4o Transcribe) and TTS (GPT-4o Mini TTS) with configurable model and voice selection
Feature
OpenAI TTS supports 10 built-in voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer
Feature
OpenAI STT/TTS follows the same cloud provider pattern as Azure and ElevenLabs — platform IAudioSource and IAudioPlayer handle audio I/O
Feature
Microsoft.Extensions.AI adapter — AddShinySpeechClients() exposes any registered cloud provider as ISpeechToTextClient and ITextToSpeechClient from Microsoft.Extensions.AI
Feature
AddShinySpeechToTextClient() / AddShinyTextToSpeechClient() for registering M.E.AI adapters individually
Feature
M.E.AI streaming support — GetStreamingTextAsync() emits SessionOpen, TextUpdating, TextUpdated, SessionClose update kinds mapped from SpeechRecognitionResult.IsFinal
Feature
M.E.AI TTS streaming — GetStreamingAudioAsync() emits SessionOpen, AudioUpdated, SessionClose update kinds with audio as DataContent
Feature
M.E.AI options mapping — SpeechToTextOptions.SpeechLanguageCultureInfo, TextToSpeechOptions voice/speed/pitch/volume mapped to Shiny equivalents
Feature WASM
Browser IAudioSource implementation — raw PCM microphone capture via the Web Audio API (getUserMedia + ScriptProcessorNode), downsampled to 16kHz 16-bit mono, matching the output format of Android, iOS, and Windows
Enhancement WASM
Cloud STT providers (Azure, custom) now work in the browser — IAudioSource provides the raw audio stream that CloudSpeechToText requires
Enhancement
Cloud provider extensions (AddAzureSpeech, AddElevenLabsTextToSpeech, AddCloudSpeechToText, AddCloudTextToSpeech) now automatically register IAudioSource and IAudioPlayer — manual AddAudioSource() / AddAudioPlayer() calls are no longer required
Feature
ISpeechToTextService.IsListening property — indicates whether speech recognition is currently active, analogous to ITextToSpeechService.IsSpeaking
Feature
ListenWithWakeWord() extension method — “Hey Siri” style wake word activation that continuously listens for a wake phrase, then captures the spoken command after it until silence
Feature
ListenForKeyword() extension method — listens continuously until one of the specified keywords is detected (case-insensitive, whole-word matching), returns the matched keyword
Feature
Wake word supports pause-then-speak — if the user says the wake phrase and pauses before speaking, the method waits for the next utterance as the command
Feature
Both methods are extension methods on ISpeechToTextService composing over ContinuousRecognize — no platform-specific code changes required
Feature
Sample apps updated with Wake Word and Keyword listening modes (MAUI + Blazor)
Feature
ISpeechToTextService interface — platform-native speech recognition with permission management, continuous streaming, and listen-until-silence modes
Feature
ITextToSpeechService interface — platform-native text-to-speech with voice selection, speech rate, pitch, and volume control
Feature
IAudioSource interface — raw PCM audio capture from the device microphone (16kHz, 16-bit, mono)
Feature
IAudioPlayer interface — MP3 audio stream playback with play/stop control
Feature
SpeechRecognitionOptions — configurable culture, silence timeout, and on-device preference for STT
Feature
TextToSpeechOptions — configurable culture, voice, speech rate, pitch, and volume for TTS
Feature
ContinuousRecognize() — streaming recognition results via IAsyncEnumerable<SpeechRecognitionResult> with partial and final results
Feature
ListenUntilSilence() — simple dictation mode that returns the final transcription after silence is detected
Feature
GetVoicesAsync() — enumerate available TTS voices with optional culture filtering
Feature
AddSpeechServices() — single extension method to register all core services (STT, TTS, AudioSource, AudioPlayer)
Feature Android
Android STT implementation using SpeechRecognizer with streaming partial results
Feature Android
Android TTS implementation using Android.Speech.Tts.TextToSpeech
Feature Android
Android audio capture via AudioRecord with 16kHz PCM streaming
Feature Android
Android audio playback via MediaPlayer
Feature iOS
iOS STT implementation using SFSpeechRecognizer with SFSpeechAudioBufferRecognitionRequest
Feature iOS
iOS TTS implementation using AVSpeechSynthesizer
Feature iOS
iOS audio capture via AVAudioEngine with PCM tap
Feature iOS
iOS audio playback via AVAudioPlayer
Feature
Cloud provider abstraction — ISpeechToTextProvider and ITextToSpeechProvider interfaces for pluggable cloud backends
Feature
CloudSpeechToText and CloudTextToSpeech — bridge classes that combine platform audio with cloud provider APIs
Feature
AddCloudSpeechToText<T>() and AddCloudTextToSpeech<T>() — generic DI registration for custom cloud providers
Feature
Azure AI Speech provider — AddAzureSpeech() registers Azure STT and/or TTS with subscription key and region
Feature
Azure TTS with SSML prosody control — speech rate, pitch, and volume mapped to SSML elements
Feature
ElevenLabs TTS provider — AddElevenLabsTextToSpeech() registers ElevenLabs cloud TTS with configurable voice and model
Feature
PipeStream utility — thread-safe producer-consumer stream using System.IO.Pipelines for bridging audio capture with cloud providers
Feature WASM
Browser/WebAssembly support — STT and TTS via Web Speech API, auto-detected at runtime via OperatingSystem.IsBrowser()
Feature WASM
Browser STT implementation using SpeechRecognition API with streaming partial and final results
Feature WASM
Browser TTS implementation using SpeechSynthesis API with voice selection, rate, pitch, and volume control
Feature WASM
Browser audio playback via HTML5 Audio element with base64 data URL conversion
Feature
Blazor WebAssembly sample app demonstrating STT, TTS, and voice listing