Client v5: BLE, BLE Hosting, HTTP, Jobs - Linux, MacOS, & Blazor Support! Full AOT, RX on BLE only & MANY other features! Power up!

Speech Releases

v3

3.0 - TBD

Feature

Microphone VU metering — the level signal now covers the input side, matching what playback already had. New IAudioSource.InputLevelChanged emits a normalized 0.0–1.0 mic level while capturing on every platform (Apple taps the capture node, Android meters the AudioRecord read loop, Windows meters each AudioGraph quantum, the browser meters the worklet’s PCM), and new ISpeechToTextService.InputLevelChanged + IsInputAnalysisSupported expose it while listening. Cloud STT (Azure / OpenAI / ElevenLabs / Microsoft.Extensions.AI / custom) forwards the capture source’s level, so a “listening” meter works everywhere; Apple native recognition meters the recognizer’s own input tap and Android forwards SpeechRecognizer.OnRmsChanged; Windows and Browser native recognizers own the mic and report IsInputAnalysisSupported = false. Capture-side events are throttled to ~20/sec and peak-held in between so a bound bar stays smooth

Feature

AudioLevel is now public (Shiny.Audio) — FromRms, FromPcm16, and FromSamples expose the dBFS mapping (-50 dB noise floor) that every meter in the library runs through, so PCM you consume yourself meters on the same scale as the events

Feature

Volume control on IAudioPlayer — new Volume (device media volume, 0.0–1.0), a VolumeChanged event, and an IsVolumeControlSupported guard. Reading works on every platform; setting is platform-limited. Android reads/writes the system STREAM_MUSIC level (AudioManager) and observes changes via a settings ContentObserver. Windows reads/writes the default render endpoint’s master volume through WASAPI IAudioEndpointVolume, with an IAudioEndpointVolumeCallback for changes. macOS reads/writes the default output device’s virtual main volume via the CoreAudio HAL (with a property listener) — settable when the current device supports it. iOS / Mac Catalyst read AVAudioSession.OutputVolume and observe it via KVO, but the setter throws NotSupportedException (Apple exposes no supported API to set the system volume — use the hardware buttons or an MPVolumeView). Browser maps Volume to the app’s own HTMLAudioElement volume (the OS volume is sandboxed), persisted across plays. VolumeChanged fires for hardware buttons, the OS volume UI, or a successful set; marshal it to the UI thread

Feature

Live microphone monitor — new IAudioMonitor (in Shiny.Audio) routes the mic straight to the current output in near-real-time (a PA / “talk over a Bluetooth speaker” scenario). Start/Stop, adjustable Gain, an InputLevelChanged VU signal, per-session AudioMonitorOptions (voice processing + preferred input/output device), and SetInputDevice/SetOutputDevice. iOS/Mac Catalyst route input → main mixer → output through AVAudioEngine and route to a Bluetooth A2DP speaker (phone mic + BT output) — the session uses no DefaultToSpeaker and Default mode so output follows a Bluetooth/wired route, auto-preferring an external output and rebuilding the engine on route changes. Android bridges AudioRecord → AudioTrack. The audio session is snapshotted and restored on Stop. Trade-off (iOS): enabling AudioProcessingOptions (echo cancellation) engages the voice-processing unit which forces Bluetooth onto the low-quality HFP profile, so a Bluetooth speaker (A2DP) drops back to the phone — leave processing off to reach a BT speaker. AirPlay (HomePod/Apple TV) is not supported for a live mic, since iOS only permits AirPlay for playback, not while recording

Feature

Audio device enumeration & selection — new IAudioDevices (in Shiny.Audio) lists input/output routes (GetInputs/GetOutputs), reports the active CurrentInput/CurrentOutput, and raises Changed when routes come and go. Each AudioDevice has a normalized Type (BuiltInMic, BluetoothA2dp, WiredHeadphones, BuiltInSpeaker, …). Reads AVAudioSession route/AvailableInputs + RouteChangeNotification on Apple and AudioManager.GetDevices + AudioDeviceCallback on Android. Selection is applied via IAudioMonitor.SetInputDevice/SetOutputDevice: Android enumerates and selects both input and output fully; iOS can select the input but treats output as observe-only (no app-level output enumeration/selection — AirPlay/Bluetooth output is owned by the system route picker). Use CurrentInput/CurrentOutput as a display property everywhere

Fix iOS

AppleAudioPlayer no longer forces DefaultToSpeaker, which pinned playback to the built-in speaker. Playback now follows the current output route — headphones / Bluetooth / an AirPlay device — so a recorded clip can play out a Bluetooth speaker

Feature

New IAudio facade — one injectable that exposes Player / Source / Monitor / Devices, so the whole audio surface is discoverable from a single dependency. The focused interfaces remain independently injectable; lifetimes are preserved (IAudioSource stays a fresh transient per access). Registered by AddAudioServices(), which now also wires AddAudioMonitor() and AddAudioDevices()

Fix iOS

Recorded/played-back audio was quiet after a capture session on iOS — AppleAudioSource left the shared AVAudioSession in the record-oriented PlayAndRecord + VoiceChat profile (attenuated, earpiece-routed), which subsequent IAudioPlayer playback inherited. Capture now snapshots the session category/options/mode on start and restores them on stop, so later playback returns to full-volume Playback routing

BREAKING Enhancement

Native speech & audio now build on Shiny.Core instead of hand-rolled platform plumbing. Android runtime permission requests (RECORD_AUDIO) and current-activity tracking are delegated to Shiny.Core’s AndroidPlatform, replacing the library’s internal ActivityProvider + PermissionRequestFragment. Apps must now reference Shiny.Hosting.Maui and call .UseShiny() on the MauiAppBuilder so AndroidPlatform is registered and receives permission callbacks — without it, RequestAccess() throws TimeoutException on Android. AccessState now comes from Shiny.Core and lives in the Shiny namespace (parent of Shiny.Audio/Shiny.Speech, so most code resolves it with no change). The duplicated Android permission-check code that previously lived in both AndroidAudioSource and the Android SpeechToTextImpl is gone

BREAKING Enhancement

The browser JS interop module now ships inside the Shiny.Audio package as a static web asset at _content/Shiny.Audio/shiny-audio.js (renamed from shiny-speech.js), loaded on demand via JSHost.ImportAsync. Blazor WebAssembly apps no longer need to copy the file into wwwroot or add a <script src="shiny-speech.js"> tag — just reference the NuGet package. Delete any existing wwwroot/shiny-speech.js left over from a previous version. Shiny.Audio is now built with the Razor SDK to produce the static web asset

Fix Browser

Browser raw audio capture callbacks (BrowserAudioSource.OnAudioData / OnCaptureError) were dispatched to the wrong assembly after audio was extracted into Shiny.Audio, breaking IAudioSource capture in the browser. The interop module now resolves exports from the correct Shiny.Audio and Shiny.Speech assemblies

Feature

Microphone voice processing — new AudioProcessingOptions (in Shiny.Audio) requests platform echo cancellation, noise suppression, and automatic gain control on a capture session. Pass it to IAudioSource.StartCaptureAsync(processing, ct) or set SpeechRecognitionOptions.AudioProcessing (honored by the cloud STT providers). Echo cancellation subtracts the device’s own speaker/TTS output from the mic so it isn’t re-captured during barge-in. Maps to the Voice-Processing I/O unit on Apple (bundled AEC+NS+AGC), AcousticEchoCanceler/NoiseSuppressor/AutomaticGainControl (+ VoiceCommunication source) on Android, the Communications capture category on Windows, and getUserMedia constraints (WebRTC AEC3) in the browser. Effects are best-effort/device-dependent; native on-device recognizers manage their own mic and are unaffected

Feature

Cloud provider credentials can now be changed at runtime. The provider config objects (AzureSpeechConfig, ElevenLabsConfig, OpenAiSpeechConfig, TypecastConfig) are mutable singletons — set a new ApiKey/SubscriptionKey (or region/model/voice) on the instance you registered (or resolve it from DI) and the provider uses it on its next call, with no re-registration. Configuration APIs (AddAzureSpeech, AddElevenLabsSpeech, AddOpenAiSpeech, AddTypecastSpeech) are unchanged. Client-caching providers rebuild their SDK/HTTP client on key change via the new RefreshableClient<T> helper in Shiny.Speech.Cloud

Feature

New Shiny.Speech.Typecast provider — cloud text-to-speech via the official typecast-csharp SDK. Register with AddTypecastSpeech(apiKey) (or a TypecastConfig for model / default voice / language / emotion / audio format). TTS-only; pair with Azure/ElevenLabs/OpenAI or native STT for recognition

BREAKING Enhancement

Audio capture and playback extracted into a new standalone Shiny.Audio package. IAudioSource, IAudioPlayer, and PipeStream moved from the Shiny.Speech namespace to the new Shiny.Audio namespace — add using Shiny.Audio; where you consume them. Shiny.Speech references Shiny.Audio automatically, so AddSpeechServices() still registers everything; no package reference changes are needed for existing speech apps

Feature

Shiny.Audio is usable on its own for recording/playback without the speech stack — register via the new AddAudioServices() (or AddAudioSource() / AddAudioPlayer()) extension methods

Feature

IAudioPlayer.PlayAsync(string source) — play audio from a remote http/https URL or a local file path. You pass a plain URL/path and each platform resolves the source natively (no platform-specific file URI required); remote sources stream progressively on Android, Windows, and Browser, and are buffered on Apple

Chore

All Shiny.Speech / Shiny.Audio / Shiny.AiConversation packages now share a single version defined by the repo-root version.json

2.1 - May 21, 2026

Feature

ISpeechToTextProvider.Error event — providers can surface non-fatal errors (e.g. transient network failures between chunked requests in continuous mode) without aborting the RecognizeAsync enumerator. CloudSpeechToText subscribes and forwards to the service-level ISpeechToTextService.Error event automatically. Azure / OpenAI / ElevenLabs providers updated to use it instead of throwing out of the enumerator

BREAKING Chore

ISpeechToTextProvider now requires implementers to expose an event EventHandler<SpeechRecognitionError>? Error; — existing custom providers must add the event declaration (it can be unraised for one-shot providers)

Feature

ElevenLabs Scribe speech-to-text provider — AddElevenLabsSpeech() now registers both STT and TTS, or use the new AddElevenLabsSpeechToText() helper. Buffers captured PCM, wraps in a WAV container, and posts a single request to /v1/speech-to-text; yields one final SpeechRecognitionResult per session

Feature

ElevenLabsConfig.SpeechToTextModel property (default scribe_v1) — configurable Scribe model id

BREAKING Chore

Renamed ElevenLabsConfig.ModelId → ElevenLabsConfig.TextToSpeechModel to disambiguate from the new SpeechToTextModel property

Fix

KeywordHeard no longer re-fires for the same final transcription within a 3-second window — eliminates duplicate keyword events caused by trailing-audio carry-over between recognition tasks (iOS SFSpeechRecognizer re-arm, Android SpeechRecognizer restart). Applied uniformly to Apple, Android, Browser, Windows, and CloudSpeechToText

Feature

ITextToSpeechService.AudioLevelChanged event and IsPlayerAnalysisSupported flag — normalized 0.0–1.0 RMS level for driving VU-meter UI during speech playback

Feature

IAudioPlayer.AudioLevelChanged event and IsPlayerAnalysisSupported flag — same RMS signal raised during generic audio playback (e.g. cloud TTS audio streams)

Enhancement iOS

Apple native TTS now routes AVSpeechSynthesizer through AVAudioEngine + AVAudioPlayerNode with a player-node tap, enabling AudioLevelChanged for built-in iOS / macOS / Mac Catalyst voices. Engine is created lazily on first speak and kept warm across utterances

Enhancement Android

Android native TTS taps UtteranceProgressListener.OnAudioAvailable to compute RMS from PCM bytes without rerouting playback

Enhancement Android

AndroidAudioPlayer attaches Android.Media.Audiofx.Visualizer to the MediaPlayer audio session for cloud TTS / generic playback metering (no RECORD_AUDIO permission needed for per-session capture; MODIFY_AUDIO_SETTINGS recommended)

Enhancement iOS

AppleAudioPlayer enables AVAudioPlayer.MeteringEnabled and polls AveragePower for VU metering during cloud / generic audio playback

Chore

CloudTextToSpeech forwards AudioLevelChanged and IsPlayerAnalysisSupported from the underlying IAudioPlayer — Azure / OpenAI / ElevenLabs / custom providers get VU metering for free

Enhancement iOS

CarPlay compatible — iOS audio session uses PlayAndRecord with AllowBluetooth, AllowBluetoothA2dp, and DefaultToSpeaker so audio automatically routes through the car’s microphone and speakers when CarPlay is active

2.0 - May 13, 2026

BREAKING Chore

ISpeechToTextService redesigned from IAsyncEnumerable-based to event-based Start/Stop model — ContinuousRecognize() and ListenUntilSilence() removed from the interface

Feature

Start(SpeechRecognitionOptions?) / Stop() methods — long-lived listening sessions with explicit lifecycle control; Start() throws if already listening, Stop() is a safe no-op

Feature

ResultReceived event — fires for every recognition result (partial and final) with full SpeechRecognitionResult including Text, IsFinal, and Confidence; multiple subscribers supported

Feature

KeywordHeard event — fires when a keyword from SpeechRecognitionOptions.Keywords is detected in a final result using case-insensitive whole-word matching

Feature

Error event — fires on recognition errors with SpeechRecognitionError containing Message and optional Exception

Feature

SpeechRecognitionError record — new type for error reporting via the Error event

Feature

SpeechRecognitionOptions.Keywords property (string[]?) — built-in keyword detection at the platform level; keywords are matched with compiled regex on final results

BREAKING Chore

ListenWithWakeWord() extension method removed — replaced by StatementAfterKeyword()

BREAKING Chore

ListenForKeyword() extension method removed — replaced by WaitListenForKeywords() and ListenForKeywords()

Feature

ListenUntilSilence() extension method — starts listening, waits for first final result, then stops; replaces the former interface method

Feature

StatementAfterKeyword(string[]) extension method — waits for a keyword to be heard, then returns the next final statement (replaces ListenWithWakeWord)

Feature

WaitListenForKeywords(string[], TimeSpan?) extension method — returns the first keyword heard with optional timeout

Feature

ListenForKeywords(string[]) extension method — yields keywords continuously as IAsyncEnumerable<string>

Enhancement

All extension methods handle Start/Stop/event wiring automatically — no manual lifecycle management needed for simple scenarios

Enhancement

Multiple classes can now subscribe to speech recognition events simultaneously — eliminates the single-consumer limitation of IAsyncEnumerable

Enhancement

Cloud provider (CloudSpeechToText) adapted to consume ISpeechToTextProvider.RecognizeAsync() internally on a background task and raise events — ISpeechToTextProvider interface unchanged

v1

1.2.1 - May 11, 2026

Feature

OpenAI cloud provider — AddOpenAiSpeech() registers OpenAI STT (Whisper / GPT-4o Transcribe) and TTS (GPT-4o Mini TTS) with configurable model and voice selection

Feature

OpenAI TTS supports 10 built-in voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer

Feature

OpenAI STT/TTS follows the same cloud provider pattern as Azure and ElevenLabs — platform IAudioSource and IAudioPlayer handle audio I/O

Feature

Microsoft.Extensions.AI adapter — AddShinySpeechClients() exposes any registered cloud provider as ISpeechToTextClient and ITextToSpeechClient from Microsoft.Extensions.AI

Feature

AddShinySpeechToTextClient() / AddShinyTextToSpeechClient() for registering M.E.AI adapters individually

Feature

M.E.AI streaming support — GetStreamingTextAsync() emits SessionOpen, TextUpdating, TextUpdated, SessionClose update kinds mapped from SpeechRecognitionResult.IsFinal

Feature

M.E.AI TTS streaming — GetStreamingAudioAsync() emits SessionOpen, AudioUpdated, SessionClose update kinds with audio as DataContent

Feature

M.E.AI options mapping — SpeechToTextOptions.SpeechLanguage → CultureInfo, TextToSpeechOptions voice/speed/pitch/volume mapped to Shiny equivalents

1.2 - May 6, 2026

Feature WASM

Browser IAudioSource implementation — raw PCM microphone capture via the Web Audio API (getUserMedia + ScriptProcessorNode), downsampled to 16kHz 16-bit mono, matching the output format of Android, iOS, and Windows

Enhancement WASM

Cloud STT providers (Azure, custom) now work in the browser — IAudioSource provides the raw audio stream that CloudSpeechToText requires

1.1.2 - May 6, 2026

Enhancement

Cloud provider extensions (AddAzureSpeech, AddElevenLabsTextToSpeech, AddCloudSpeechToText, AddCloudTextToSpeech) now automatically register IAudioSource and IAudioPlayer — manual AddAudioSource() / AddAudioPlayer() calls are no longer required

1.1.1 - May 5, 2026

Feature

ISpeechToTextService.IsListening property — indicates whether speech recognition is currently active, analogous to ITextToSpeechService.IsSpeaking

1.1 - May 4, 2026

Feature

ListenWithWakeWord() extension method — “Hey Siri” style wake word activation that continuously listens for a wake phrase, then captures the spoken command after it until silence

Feature

ListenForKeyword() extension method — listens continuously until one of the specified keywords is detected (case-insensitive, whole-word matching), returns the matched keyword

Feature

Wake word supports pause-then-speak — if the user says the wake phrase and pauses before speaking, the method waits for the next utterance as the command

Feature

Both methods are extension methods on ISpeechToTextService composing over ContinuousRecognize — no platform-specific code changes required

Feature

Sample apps updated with Wake Word and Keyword listening modes (MAUI + Blazor)

1.0 - May 2, 2026

Feature

ISpeechToTextService interface — platform-native speech recognition with permission management, continuous streaming, and listen-until-silence modes

Feature

ITextToSpeechService interface — platform-native text-to-speech with voice selection, speech rate, pitch, and volume control

Feature

IAudioSource interface — raw PCM audio capture from the device microphone (16kHz, 16-bit, mono)

Feature

IAudioPlayer interface — MP3 audio stream playback with play/stop control

Feature

SpeechRecognitionOptions — configurable culture, silence timeout, and on-device preference for STT

Feature

TextToSpeechOptions — configurable culture, voice, speech rate, pitch, and volume for TTS

Feature

ContinuousRecognize() — streaming recognition results via IAsyncEnumerable<SpeechRecognitionResult> with partial and final results

Feature

ListenUntilSilence() — simple dictation mode that returns the final transcription after silence is detected

Feature

GetVoicesAsync() — enumerate available TTS voices with optional culture filtering

Feature

AddSpeechServices() — single extension method to register all core services (STT, TTS, AudioSource, AudioPlayer)

Feature Android

Android STT implementation using SpeechRecognizer with streaming partial results

Feature Android

Android TTS implementation using Android.Speech.Tts.TextToSpeech

Feature Android

Android audio capture via AudioRecord with 16kHz PCM streaming

Feature Android

Android audio playback via MediaPlayer

Feature iOS

iOS STT implementation using SFSpeechRecognizer with SFSpeechAudioBufferRecognitionRequest

Feature iOS

iOS TTS implementation using AVSpeechSynthesizer

Feature iOS

iOS audio capture via AVAudioEngine with PCM tap

Feature iOS

iOS audio playback via AVAudioPlayer

Feature

Cloud provider abstraction — ISpeechToTextProvider and ITextToSpeechProvider interfaces for pluggable cloud backends

Feature

CloudSpeechToText and CloudTextToSpeech — bridge classes that combine platform audio with cloud provider APIs

Feature

AddCloudSpeechToText<T>() and AddCloudTextToSpeech<T>() — generic DI registration for custom cloud providers

Feature

Azure AI Speech provider — AddAzureSpeech() registers Azure STT and/or TTS with subscription key and region

Feature

Azure TTS with SSML prosody control — speech rate, pitch, and volume mapped to SSML elements

Feature

ElevenLabs TTS provider — AddElevenLabsTextToSpeech() registers ElevenLabs cloud TTS with configurable voice and model

Feature

PipeStream utility — thread-safe producer-consumer stream using System.IO.Pipelines for bridging audio capture with cloud providers

Feature WASM

Browser/WebAssembly support — STT and TTS via Web Speech API, auto-detected at runtime via OperatingSystem.IsBrowser()

Feature WASM

Browser STT implementation using SpeechRecognition API with streaming partial and final results

Feature WASM

Browser TTS implementation using SpeechSynthesis API with voice selection, rate, pitch, and volume control

Feature WASM

Browser audio playback via HTML5 Audio element with base64 data URL conversion

Feature

Blazor WebAssembly sample app demonstrating STT, TTS, and voice listing