AI Models/bytedance/ByteDance: UI-TARS 7B
bytedanceChat

ByteDance: UI-TARS 7B

bytedance/ui-tars-1.5-7b
128KContext Window
2KMax Output
Supported Protocols:max_tokenstemperaturetop_pfrequency_penaltypresence_penaltyrepetition_penaltyseedstoptop_klogit_bias
Normal

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

Capabilities

👁 VisionText GenerationCode GenerationAnalysis & ReasoningReasoning

Technical Specs

Input Modality
Image、Text
Output Modality
Text
Arch
Default Temperature
0.7
Default Top_P
1

Pricing

Pay per use, no monthly fees
Billing TypeUnitPrice
Text Input$0.1000/M tokens
Text Output$0.2000/M tokens
Cache Read$0.1000/M tokens

Quick Start

from openai import OpenAI

client = OpenAI(
    base_url="https://api.uniontoken.ai/v1",
    api_key="YOUR_UNIONTOKEN_API_KEY",
)

response = client.chat.completions.create(
    model="bytedance/ui-tars-1.5-7b",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.choices[0].message.content)

FAQ

Ready to get started?

Get 1M free tokens on registration, no monthly fees or minimum spend

Register Now →