AI Models//Google: Gemini 3.1 Flash TTS Preview
Chat

Google: Gemini 3.1 Flash TTS Preview

google/gemini-3.1-flash-tts-preview
8KContext Window
4KMax Output
Normal

Gemini 3.1 Flash TTS Preview is a text-to-speech model from Google, and a substantial generational step up from Gemini 2.5 Flash TTS. It takes text input and produces audio output across 70+ languages — nearly 3× the language coverage of its predecessor. The headline addition is a system of 200+ inline audio tags (e.g. `[whispers]`, `[laughs]`, `[excited]`) that let developers steer delivery, emotion, and pacing mid-sentence, alongside a "director's chair" workflow in Google AI Studio for defining per-character Audio Profiles and scene-level context. It supports up to two speakers with independent voice and style configuration per speaker, outputs PCM audio at 24 kHz / 16-bit mono, and automatically watermarks all output with SynthID. Context window is 32k tokens.

Capabilities

Audio GenerationSpeech Recognition

Technical Specs

Input Modality
Text
Output Modality
Text
Arch

Pricing

Pay per use, no monthly fees
Billing TypeUnitPrice
Text Input$1.0000/M tokens
Text Output$20.0000/M tokens
Reasoning$1.0000/M tokens
Audio Output< $0.001/分钟

Quick Start

from openai import OpenAI

client = OpenAI(
    base_url="https://api.uniontoken.ai/v1",
    api_key="YOUR_UNIONTOKEN_API_KEY",
)

response = client.chat.completions.create(
    model="google/gemini-3.1-flash-tts-preview",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.choices[0].message.content)

FAQ

Ready to get started?

Get 1M free tokens on registration, no monthly fees or minimum spend

Register Now →