Google: Gemini Embedding 2 Preview

google/gemini-embedding-2-preview

8KContext Window

4KMax Output

Normal

Gemini Embedding 2 Preview is Google's first multimodal embedding model. We currently support mapping text and images into a unified vector space for semantic search and retrieval-augmented generation (RAG). It supports input context up to 8,192 tokens and flexible output dimensions from 128 to 3,072 (recommended: 768, 1536, or 3,072). Designed for cross-modal similarity — you can embed a text query and retrieve the most relevant images, or vice versa — making it well-suited for multimodal search, recommendation, and document understanding pipelines.

Capabilities

Text GenerationImage Generation

Technical Specs

Input Modality

Text

Output Modality

Text

Arch

—

Pricing

Pay per use, no monthly fees

Billing Type	Unit	Price
Text Input	—	$0.2000/M tokens
Image Input	—	< $0.001/张
Video Input	—	$12.0000/ M tokens
Audio Input	—	$6.5000/分钟

Quick Start

from openai import OpenAI

client = OpenAI(
    base_url="https://api.uniontoken.ai/v1",
    api_key="YOUR_UNIONTOKEN_API_KEY",
)

response = client.chat.completions.create(
    model="google/gemini-embedding-2-preview",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.choices[0].message.content)