Google: Gemini 3.1 Flash TTS Preview
Gemini 3.1 Flash TTS Preview is a text-to-speech model from Google, and a substantial generational step up from Gemini 2.5 Flash TTS. It takes text input and produces audio output across 70+ languages — nearly 3× the language coverage of its predecessor. The headline addition is a system of 200+ inline audio tags (e.g. `[whispers]`, `[laughs]`, `[excited]`) that let developers steer delivery, emotion, and pacing mid-sentence, alongside a "director's chair" workflow in Google AI Studio for defining per-character Audio Profiles and scene-level context. It supports up to two speakers with independent voice and style configuration per speaker, outputs PCM audio at 24 kHz / 16-bit mono, and automatically watermarks all output with SynthID. Context window is 32k tokens.
Capabilities
Technical Specs
Pricing
Pay per use, no monthly fees| Billing Type | Unit | Price |
|---|---|---|
| Text Input | — | $1.0000/M tokens |
| Text Output | — | $20.0000/M tokens |
| Reasoning | — | $1.0000/M tokens |
| Audio Output | — | < $0.001/分钟 |
Quick Start
from openai import OpenAI
client = OpenAI(
base_url="https://api.uniontoken.ai/v1",
api_key="YOUR_UNIONTOKEN_API_KEY",
)
response = client.chat.completions.create(
model="google/gemini-3.1-flash-tts-preview",
messages=[
{"role": "user", "content": "Hello!"}
],
)
print(response.choices[0].message.content)FAQ
Related Models
View All → →Ready to get started?
Get 1M free tokens on registration, no monthly fees or minimum spend
Register Now →