Introducing AI Conversations: Natural Language Interaction for Your Apps! Learn More

OpenAI


Downloads

Frameworks

.NET

.NET MAUI

Operating Systems

Android

iOS

Windows

Overview

OpenAI provides powerful speech-to-text (Whisper / GPT-4o Transcribe) and text-to-speech models. This provider replaces the platform-native ISpeechToTextService and ITextToSpeechService registrations with OpenAI cloud implementations while still using platform-native audio capture and playback.

Setup

NuGet Packages
Project File
MauiProgram.cs

// In MauiProgram.cs
builder.Services.AddOpenAiSpeech("your-api-key");
// IAudioSource and IAudioPlayer are automatically registered for platform audio I/O

Or with a config object and selective services:

builder.Services.AddOpenAiSpeech(
    new OpenAiSpeechConfig
    {
        ApiKey = "your-api-key",
        SpeechToTextModel = "gpt-4o-transcribe",
        TextToSpeechModel = "gpt-4o-mini-tts",
        DefaultVoice = "alloy"
    },
    speechToText: true,
    textToSpeech: true
);

Configuration

public record OpenAiSpeechConfig
{
    public required string ApiKey { get; init; }
    public string SpeechToTextModel { get; init; } = "gpt-4o-transcribe";
    public string TextToSpeechModel { get; init; } = "gpt-4o-mini-tts";
    public string DefaultVoice { get; init; } = "alloy";
}

Property	Description	Default
`ApiKey`	Your OpenAI API key	(required)
`SpeechToTextModel`	The model to use for transcription	`gpt-4o-transcribe`
`TextToSpeechModel`	The model to use for speech synthesis	`gpt-4o-mini-tts`
`DefaultVoice`	Voice ID to use when none specified in `TextToSpeechOptions`	`alloy`

Available Voices

OpenAI provides the following built-in voices:

Voice	Description
`alloy`	Neutral, balanced
`ash`	Warm, conversational
`ballad`	Soft, gentle
`coral`	Clear, friendly
`echo`	Smooth, resonant
`fable`	Expressive, animated
`onyx`	Deep, authoritative
`nova`	Bright, energetic
`sage`	Calm, measured
`shimmer`	Light, upbeat

Usage

Once registered, inject and use ISpeechToTextService and ITextToSpeechService exactly as you would with platform-native speech — the API is identical:

public class MyViewModel(ISpeechToTextService stt, ITextToSpeechService tts)
{
    async Task ListenAndRespond(CancellationToken ct)
    {
        var text = await stt.ListenUntilSilence(cancellationToken: ct);

        if (text != null)
            await tts.SpeakAsync($"You said: {text}");
    }

    async Task SpeakWithVoice()
    {
        var voices = await tts.GetVoicesAsync();
        var voice = voices.FirstOrDefault(v => v.Name == "Nova");

        if (voice != null)
        {
            await tts.SpeakAsync("Hello from OpenAI!", new TextToSpeechOptions
            {
                Voice = voice,
                SpeechRate = 1.2f
            });
        }
    }
}

STT-Only or TTS-Only

You can register OpenAI for just one service:

// OpenAI STT only (use platform-native TTS)
builder.Services.AddTextToSpeech();   // Platform-native TTS
builder.Services.AddOpenAiSpeech("key", speechToText: true, textToSpeech: false);

// OpenAI TTS only (use platform-native STT)
builder.Services.AddSpeechToText();   // Platform-native STT
builder.Services.AddOpenAiSpeech("key", speechToText: false, textToSpeech: true);