Skip to content
Introducing AI Conversations: Natural Language Interaction for Your Apps! Learn More

Custom Provider

Shiny Speech uses a pluggable cloud provider architecture. You can implement your own STT and/or TTS providers by implementing the ISpeechToTextProvider and ITextToSpeechProvider interfaces from Shiny.Speech.Cloud.

Cloud providers replace the platform-native ISpeechToTextService / ITextToSpeechService registrations. The required platform-native audio services (IAudioSource and IAudioPlayer) are automatically registered by the cloud extension methods.

Implement ISpeechToTextProvider to receive raw PCM audio (16kHz, 16-bit, mono) and yield recognition results:

using Shiny.Speech;
using Shiny.Speech.Cloud;
public class MyCloudSttProvider : ISpeechToTextProvider
{
public event EventHandler<SpeechRecognitionError>? Error;
public async IAsyncEnumerable<SpeechRecognitionResult> RecognizeAsync(
Stream audioStream,
SpeechRecognitionOptions? options = null,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
try
{
// Read PCM audio from audioStream
// Send to your cloud API
// Yield results as they arrive
yield return new SpeechRecognitionResult(
"recognized text",
IsFinal: true,
Confidence: 0.95f
);
}
catch (HttpRequestException ex)
{
// Non-fatal: raise Error and keep the session running.
// Throwing here would terminate the IAsyncEnumerable and end the session.
Error?.Invoke(this, new SpeechRecognitionError(ex.Message, ex));
}
}
}

The Error event lets continuous providers surface transient failures (e.g. a network blip between chunked requests) without tearing down the RecognizeAsync enumerator. CloudSpeechToText subscribes to it internally and forwards each error to the service-level ISpeechToTextService.Error event, so app code only needs to wire one handler.

Register with DI:

builder.Services.AddCloudSpeechToText<MyCloudSttProvider>();
// IAudioSource is automatically registered

Implement ITextToSpeechProvider to synthesize text into an audio stream:

using Shiny.Speech;
using Shiny.Speech.Cloud;
public class MyCloudTtsProvider : ITextToSpeechProvider
{
public async Task<IReadOnlyList<VoiceInfo>> GetVoicesAsync(
CultureInfo? culture = null,
CancellationToken cancellationToken = default)
{
// Return available voices from your cloud provider
return new[]
{
new VoiceInfo("voice-1", "Default Voice", CultureInfo.GetCultureInfo("en-US"))
};
}
public async Task<Stream> SynthesizeAsync(
string text,
TextToSpeechOptions? options = null,
CancellationToken cancellationToken = default)
{
// Send text to your cloud API
// Return the audio stream (MP3 format)
return audioStream;
}
}

Register with DI:

builder.Services.AddCloudTextToSpeech<MyCloudTtsProvider>();
// IAudioPlayer is automatically registered
public interface ISpeechToTextProvider
{
IAsyncEnumerable<SpeechRecognitionResult> RecognizeAsync(
Stream audioStream,
SpeechRecognitionOptions? options = null,
CancellationToken cancellationToken = default
);
// Raised for non-fatal errors during continuous recognition.
// CloudSpeechToText forwards this to ISpeechToTextService.Error.
event EventHandler<SpeechRecognitionError>? Error;
}
public interface ITextToSpeechProvider
{
Task<IReadOnlyList<VoiceInfo>> GetVoicesAsync(
CultureInfo? culture = null,
CancellationToken cancellationToken = default
);
Task<Stream> SynthesizeAsync(
string text,
TextToSpeechOptions? options = null,
CancellationToken cancellationToken = default
);
}

When you register a cloud provider, Shiny Speech creates a bridge class (CloudSpeechToText or CloudTextToSpeech) that:

  1. Captures audio from IAudioSource (for STT) or receives synthesized audio (for TTS)
  2. Delegates recognition/synthesis to your provider implementation
  3. Exposes the standard ISpeechToTextService / ITextToSpeechService interface

This means your app code doesn’t change — you can swap between platform-native, Azure, ElevenLabs, or your own provider by changing only the DI registration.