Introducing AI Conversations: Natural Language Interaction for Your Apps! Learn More

Getting Started


GitHub
Core
Cloud
Azure
OpenAI
ElevenLabs
Microsoft.Extensions.AI

Frameworks

.NET

.NET MAUI

Operating Systems

Android

iOS

Windows

Web

Shiny.Speech provides a unified API for speech-to-text, text-to-speech, audio capture, and audio playback across Android, iOS, Windows, and Browser (Blazor WebAssembly) — with pluggable cloud providers for Azure AI Speech and ElevenLabs.

Features

Event-based speech-to-text with Start/Stop lifecycle — multiple subscribers supported
Built-in keyword detection via SpeechRecognitionOptions.Keywords and the KeywordHeard event
Platform-native text-to-speech with voice selection, rate, pitch, and volume control
Raw audio capture from the device microphone (16kHz, 16-bit, mono PCM)
Audio playback for MP3 streams
Pluggable cloud provider architecture — swap between on-device and cloud STT/TTS
Azure AI Speech integration (STT + TTS) with SSML prosody control
OpenAI integration (STT + TTS) powered by Whisper and GPT-4o models
ElevenLabs integration — Scribe speech-to-text and multilingual text-to-speech
Microsoft.Extensions.AI adapter — expose providers as ISpeechToTextClient / ITextToSpeechClient
State tracking — IsListening (STT), IsSpeaking (TTS), IsPlaying (audio)
VU meter signal — AudioLevelChanged event on ITextToSpeechService and IAudioPlayer emits a normalized 0.0–1.0 RMS level during playback; IsPlayerAnalysisSupported reports availability per platform
Permission management via AccessState and RequestAccess()
Convenience extension methods: ListenUntilSilence, StatementAfterKeyword, WaitListenForKeywords, ListenForKeywords
CarPlay compatible — iOS audio routes through the car’s mic/speakers automatically when CarPlay is active

Packages

Package	Purpose
`Shiny.Speech`	Core library — platform-native STT, TTS, audio capture, and playback
`Shiny.Speech.Cloud`	Cloud provider abstractions (included transitively by Azure/ElevenLabs)
`Shiny.Speech.Azure`	Azure AI Speech provider (STT + TTS)
`Shiny.Speech.OpenAI`	OpenAI provider (STT + TTS) — Whisper, GPT-4o Transcribe, GPT-4o Mini TTS
`Shiny.Speech.ElevenLabs`	ElevenLabs provider — Scribe STT + TTS
`Shiny.Speech.MicrosoftAI`	Microsoft.Extensions.AI adapter — `ISpeechToTextClient` / `ITextToSpeechClient`

Setup

Shiny.Speech

Platform Notes

Android’s native speech recognition engine works in short recording segments — it automatically stops listening after a period of silence and must restart for the next segment. This creates brief pauses during continuous listening that are noticeable compared to iOS, which streams continuously.

If smooth, uninterrupted speech recognition is important for your app, use Azure Speech Services which provides continuous streaming recognition on all platforms, eliminating the start/stop behavior.

builder.Services.AddAzureSpeech("your-subscription-key", "eastus");

Platform Permissions

Android — Add to AndroidManifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />

MODIFY_AUDIO_SETTINGS is required for the TTS audio-level Visualizer and for the native STT beep suppression.

iOS — Add to Info.plist:

<key>NSSpeechRecognitionUsageDescription</key>
<string>This app uses speech recognition</string>
<key>NSMicrophoneUsageDescription</key>
<string>This app uses the microphone for speech recognition</string>

Windows — Add the Microphone capability to your Package.appxmanifest:

<Capabilities>
    <DeviceCapability Name="microphone" />
</Capabilities>

Browser (Blazor WebAssembly) — No manifest changes needed. The browser prompts for microphone access automatically. Include the JS interop module in index.html:

<script src="shiny-speech.js"></script>

IAudioSource captures raw PCM audio in the browser using the Web Audio API (getUserMedia + ScriptProcessorNode), downsampled to 16kHz 16-bit mono.

Quick Example

public class MyViewModel
{
    readonly ISpeechToTextService _stt;
    readonly ITextToSpeechService _tts;

    public MyViewModel(ISpeechToTextService stt, ITextToSpeechService tts)
    {
        _stt = stt;
        _tts = tts;
    }

    async Task ListenAndRespond(CancellationToken ct)
    {
        // 1. Request permission
        var access = await _stt.RequestAccess();
        if (access != AccessState.Available)
            return;

        // 2. Listen until the user stops speaking
        var text = await _stt.ListenUntilSilence(
            new SpeechRecognitionOptions
            {
                Culture = CultureInfo.GetCultureInfo("en-US"),
                SilenceTimeout = TimeSpan.FromSeconds(3)
            },
            ct
        );

        if (text != null)
        {
            // 3. Speak the result back
            await _tts.SpeakAsync($"You said: {text}");
        }
    }

    async Task EventBasedListening()
    {
        // Subscribe to events — multiple classes can subscribe simultaneously
        _stt.ResultReceived += (s, result) =>
            Console.WriteLine($"[{(result.IsFinal ? "FINAL" : "partial")}] {result.Text}");

        _stt.KeywordHeard += (s, keyword) =>
            Console.WriteLine($"Keyword: {keyword}");

        // Start listening with keyword detection
        await _stt.Start(new SpeechRecognitionOptions
        {
            Keywords = ["Yes", "No", "Maybe"]
        });

        // Later: stop
        await _stt.Stop();
    }

    async Task WakeWordExample(CancellationToken ct)
    {
        // "Hey Computer, do something" → returns "do something"
        var command = await _stt.StatementAfterKeyword(["Hey Computer"], cancellationToken: ct);

        if (command != null)
            await _tts.SpeakAsync($"You asked: {command}");
    }

    async Task KeywordExample(CancellationToken ct)
    {
        // Wait for a keyword (with optional timeout)
        await _tts.SpeakAsync("Do you agree? Say yes, no, or maybe.");
        var answer = await _stt.WaitListenForKeywords(
            ["Yes", "No", "Maybe"],
            timeout: TimeSpan.FromSeconds(30),
            cancellationToken: ct
        );
    }
}

VU Meter (Audio Level)

Both ITextToSpeechService and IAudioPlayer expose an AudioLevelChanged event that fires periodically while audio is playing, with a single normalized RMS level in the 0.0–1.0 range. Use it to drive a VU bar, waveform pulse, or speaking indicator. Check IsPlayerAnalysisSupported before binding UI — it’s false on platforms where metering isn’t available (e.g. Windows native TTS, Browser).

public partial class TtsViewModel(ITextToSpeechService tts) : ObservableObject
{
    [ObservableProperty] double audioLevel; // 0.0 .. 1.0
    public bool IsVuSupported => tts.IsPlayerAnalysisSupported;

    public TtsViewModel(ITextToSpeechService tts) : this(tts)
        => tts.AudioLevelChanged += (_, level) =>
            MainThread.BeginInvokeOnMainThread(() => AudioLevel = level);
}

<ProgressBar Progress="{Binding AudioLevel}"
             IsVisible="{Binding IsVuSupported}" />

Surface	iOS / macOS	Android	Windows	Browser
Native TTS (`ITextToSpeechService`)	✅ — `AVAudioEngine` + player-node tap	✅ — `OnAudioAvailable` RMS	❌	❌
Cloud TTS (`CloudTextToSpeech`)	✅ — forwarded from `IAudioPlayer`	✅ — forwarded from `IAudioPlayer`	❌	❌
Generic playback (`IAudioPlayer`)	✅ — `AVAudioPlayer.MeteringEnabled`	✅ — `Visualizer` on session	❌	❌

On iOS / macOS the native TTS path routes AVSpeechSynthesizer through AVAudioEngine + AVAudioPlayerNode so a tap can compute RMS. The engine is created lazily and kept warm across utterances — first-utterance latency adds roughly 50–150 ms; subsequent calls are indistinguishable from the legacy direct path.

Samples

AI Coding Assistant

Step 1 — Add the marketplace:

claude plugin marketplace add shinyorg/skills

Step 2 — Install plugins:

claude plugin install shiny-client@shiny

BLE, GPS, Jobs, Notifications, Push, HTTP Transfers, OBD, Music, Health, DataSync — iOS, Android, Windows, MacOS, Linux, Web

claude plugin install shiny-maui@shiny

Shell, Contact Store

claude plugin install controls@shiny

TableView, BottomSheet, PillView, ImageViewer, Scheduler, Markdown, Mermaid Diagrams — MAUI and Blazor

claude plugin install shiny-mediator@shiny

Mediator/CQRS with middleware and source generators

claude plugin install shiny-data@shiny

DocumentDB and Spatial data libraries

claude plugin install shiny-aspire@shiny

Orleans and Gluetun Aspire integrations

claude plugin install shiny-extensions@shiny

DI, Stores, Reflector, Localization, Hosting modules

Step 1 — Add the marketplace:

copilot plugin marketplace add https://github.com/shinyorg/skills

Step 2 — Install plugins:

copilot plugin install shiny-client@shiny

BLE, GPS, Jobs, Notifications, Push, HTTP Transfers, OBD, Music, Health, DataSync — iOS, Android, Windows, MacOS, Linux, Web

copilot plugin install shiny-maui@shiny

Shell, Contact Store

copilot plugin install controls@shiny