Meta: Llama 4 Scout

meta-llama/llama-4-scout

328KContext Window

16KMax Output

Supported Protocols:max_tokenstemperaturetop_pstopfrequency_penaltypresence_penaltyrepetition_penaltytop_kseedmin_presponse_format

Normal

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025.

Capabilities

👁 VisionText GenerationCode GenerationAnalysis & ReasoningReasoning

Technical Specs

Input Modality

Text、Image

Output Modality

Text

Arch

—

Default Temperature

0.7

Default Top_P

Pricing

Pay per use, no monthly fees

Billing Type	Unit	Price
Text Input	—	$0.0800/M tokens
Text Output	—	$0.3000/M tokens

Quick Start

from openai import OpenAI

client = OpenAI(
    base_url="https://api.uniontoken.ai/v1",
    api_key="YOUR_UNIONTOKEN_API_KEY",
)

response = client.chat.completions.create(
    model="meta-llama/llama-4-scout",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.choices[0].message.content)