Getting Started
| GitHub | |
| Core | |
| Cloud | |
| Azure | |
| OpenAI | |
| ElevenLabs | |
| Microsoft.Extensions.AI |
Shiny.Speech provides a unified API for speech-to-text, text-to-speech, audio capture, and audio playback across Android, iOS, Windows, and Browser (Blazor WebAssembly) — with pluggable cloud providers for Azure AI Speech and ElevenLabs.
Features
Section titled “Features”- Event-based speech-to-text with Start/Stop lifecycle — multiple subscribers supported
- Built-in keyword detection via
SpeechRecognitionOptions.Keywordsand theKeywordHeardevent - Platform-native text-to-speech with voice selection, rate, pitch, and volume control
- Raw audio capture from the device microphone (16kHz, 16-bit, mono PCM)
- Audio playback for MP3 streams
- Pluggable cloud provider architecture — swap between on-device and cloud STT/TTS
- Azure AI Speech integration (STT + TTS) with SSML prosody control
- OpenAI integration (STT + TTS) powered by Whisper and GPT-4o models
- ElevenLabs integration — Scribe speech-to-text and multilingual text-to-speech
- Microsoft.Extensions.AI adapter — expose providers as
ISpeechToTextClient/ITextToSpeechClient - State tracking —
IsListening(STT),IsSpeaking(TTS),IsPlaying(audio) - VU meter signal —
AudioLevelChangedevent onITextToSpeechServiceandIAudioPlayeremits a normalized 0.0–1.0 RMS level during playback;IsPlayerAnalysisSupportedreports availability per platform - Permission management via
AccessStateandRequestAccess() - Convenience extension methods:
ListenUntilSilence,StatementAfterKeyword,WaitListenForKeywords,ListenForKeywords - CarPlay compatible — iOS audio routes through the car’s mic/speakers automatically when CarPlay is active
Packages
Section titled “Packages”| Package | Purpose |
|---|---|
Shiny.Speech | Core library — platform-native STT, TTS, audio capture, and playback |
Shiny.Speech.Cloud | Cloud provider abstractions (included transitively by Azure/ElevenLabs) |
Shiny.Speech.Azure | Azure AI Speech provider (STT + TTS) |
Shiny.Speech.OpenAI | OpenAI provider (STT + TTS) — Whisper, GPT-4o Transcribe, GPT-4o Mini TTS |
Shiny.Speech.ElevenLabs | ElevenLabs provider — Scribe STT + TTS |
Shiny.Speech.MicrosoftAI | Microsoft.Extensions.AI adapter — ISpeechToTextClient / ITextToSpeechClient |
Platform Notes
Section titled “Platform Notes”Platform Permissions
Section titled “Platform Permissions”Android — Add to AndroidManifest.xml:
<uses-permission android:name="android.permission.RECORD_AUDIO" /><uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />MODIFY_AUDIO_SETTINGS is required for the TTS audio-level Visualizer and for the native STT beep suppression.
iOS — Add to Info.plist:
<key>NSSpeechRecognitionUsageDescription</key><string>This app uses speech recognition</string><key>NSMicrophoneUsageDescription</key><string>This app uses the microphone for speech recognition</string>Windows — Add the Microphone capability to your Package.appxmanifest:
<Capabilities> <DeviceCapability Name="microphone" /></Capabilities>Browser (Blazor WebAssembly) — No manifest changes needed. The browser prompts for microphone access automatically. Include the JS interop module in index.html:
<script src="shiny-speech.js"></script>
IAudioSourcecaptures raw PCM audio in the browser using the Web Audio API (getUserMedia+ScriptProcessorNode), downsampled to 16kHz 16-bit mono.
Quick Example
Section titled “Quick Example”public class MyViewModel{ readonly ISpeechToTextService _stt; readonly ITextToSpeechService _tts;
public MyViewModel(ISpeechToTextService stt, ITextToSpeechService tts) { _stt = stt; _tts = tts; }
async Task ListenAndRespond(CancellationToken ct) { // 1. Request permission var access = await _stt.RequestAccess(); if (access != AccessState.Available) return;
// 2. Listen until the user stops speaking var text = await _stt.ListenUntilSilence( new SpeechRecognitionOptions { Culture = CultureInfo.GetCultureInfo("en-US"), SilenceTimeout = TimeSpan.FromSeconds(3) }, ct );
if (text != null) { // 3. Speak the result back await _tts.SpeakAsync($"You said: {text}"); } }
async Task EventBasedListening() { // Subscribe to events — multiple classes can subscribe simultaneously _stt.ResultReceived += (s, result) => Console.WriteLine($"[{(result.IsFinal ? "FINAL" : "partial")}] {result.Text}");
_stt.KeywordHeard += (s, keyword) => Console.WriteLine($"Keyword: {keyword}");
// Start listening with keyword detection await _stt.Start(new SpeechRecognitionOptions { Keywords = ["Yes", "No", "Maybe"] });
// Later: stop await _stt.Stop(); }
async Task WakeWordExample(CancellationToken ct) { // "Hey Computer, do something" → returns "do something" var command = await _stt.StatementAfterKeyword(["Hey Computer"], cancellationToken: ct);
if (command != null) await _tts.SpeakAsync($"You asked: {command}"); }
async Task KeywordExample(CancellationToken ct) { // Wait for a keyword (with optional timeout) await _tts.SpeakAsync("Do you agree? Say yes, no, or maybe."); var answer = await _stt.WaitListenForKeywords( ["Yes", "No", "Maybe"], timeout: TimeSpan.FromSeconds(30), cancellationToken: ct ); }}VU Meter (Audio Level)
Section titled “VU Meter (Audio Level)”Both ITextToSpeechService and IAudioPlayer expose an AudioLevelChanged event that fires periodically while audio is playing, with a single normalized RMS level in the 0.0–1.0 range. Use it to drive a VU bar, waveform pulse, or speaking indicator. Check IsPlayerAnalysisSupported before binding UI — it’s false on platforms where metering isn’t available (e.g. Windows native TTS, Browser).
public partial class TtsViewModel(ITextToSpeechService tts) : ObservableObject{ [ObservableProperty] double audioLevel; // 0.0 .. 1.0 public bool IsVuSupported => tts.IsPlayerAnalysisSupported;
public TtsViewModel(ITextToSpeechService tts) : this(tts) => tts.AudioLevelChanged += (_, level) => MainThread.BeginInvokeOnMainThread(() => AudioLevel = level);}<ProgressBar Progress="{Binding AudioLevel}" IsVisible="{Binding IsVuSupported}" />| Surface | iOS / macOS | Android | Windows | Browser |
|---|---|---|---|---|
Native TTS (ITextToSpeechService) | ✅ — AVAudioEngine + player-node tap | ✅ — OnAudioAvailable RMS | ❌ | ❌ |
Cloud TTS (CloudTextToSpeech) | ✅ — forwarded from IAudioPlayer | ✅ — forwarded from IAudioPlayer | ❌ | ❌ |
Generic playback (IAudioPlayer) | ✅ — AVAudioPlayer.MeteringEnabled | ✅ — Visualizer on session | ❌ | ❌ |
On iOS / macOS the native TTS path routes AVSpeechSynthesizer through AVAudioEngine + AVAudioPlayerNode so a tap can compute RMS. The engine is created lazily and kept warm across utterances — first-utterance latency adds roughly 50–150 ms; subsequent calls are indistinguishable from the legacy direct path.
Samples
Section titled “Samples”AI Coding Assistant
Section titled “AI Coding Assistant”Step 1 — Add the marketplace:
claude plugin marketplace add shinyorg/skills Step 2 — Install plugins:
claude plugin install shiny-client@shiny claude plugin install shiny-maui@shiny claude plugin install controls@shiny claude plugin install shiny-mediator@shiny claude plugin install shiny-data@shiny claude plugin install shiny-aspire@shiny claude plugin install shiny-extensions@shiny Step 1 — Add the marketplace:
copilot plugin marketplace add https://github.com/shinyorg/skills Step 2 — Install plugins:
copilot plugin install shiny-client@shiny copilot plugin install shiny-maui@shiny copilot plugin install controls@shiny copilot plugin install shiny-mediator@shiny copilot plugin install shiny-data@shiny copilot plugin install shiny-aspire@shiny copilot plugin install shiny-extensions@shiny