Seed1.5-VL

Overview

Seed1.5-VL is ByteDance Seed's standalone vision-language foundation model, described in its own technical report as a model "designed to advance general-purpose multimodal understanding and reasoning." It pairs a 532M-parameter vision encoder with a Mixture-of-Experts language model of 20B active parameters — a deliberately compact design that nonetheless reaches state-of-the-art performance on 38 out of 60 public benchmarks.

It is also, importantly, the only dedicated vision-language SKU in ByteDance Seed's model directory.

Status as of 2026-07-08: retired from Volcano Engine. The model ID doubao-1-5-thinking-vision-pro-250428 reached End of Service on 2026-03-31 at 14:00 (UTC+8) and is no longer callable. Endpoints still pointing at it are automatically switched to doubao-seed-2-0-lite-260215. The technical report and cookbook remain valuable as reference material; the served model does not exist any more. See Pricing & Access below.

A note on "Seed 2.0 Vision"

This page previously described a model called Doubao Seed 2.0 Vision. That model does not exist and never did.

ByteDance's Seed 2.0 launch post enumerates the family exhaustively: three general-purpose agent models — Pro, Lite, and Mini — "along with a dedicated Code model." No Vision variant. The Seed 2.1 release that followed on June 23, 2026 shipped Seed 2.1 and Seed 2.1 Pro, again with no Vision variant. No doubao-seed-2.0-vision model ID exists on Volcano Engine or anywhere else.

The reason is straightforward: vision is native to every Seed 2.x model. Seed 2.0 and Seed 2.1 take image and video input directly, and Seed 2.0 Lite was upgraded at the end of April 2026 to add audio, becoming the Seed series' first omni-modal understanding model. ByteDance had no reason to ship a separate vision SKU, and did not.

What ByteDance does list as a standalone vision-language model is Seed1.5-VL, and that is what this page now documents.

Capabilities

From the technical report and the accompanying cookbook, Seed1.5-VL demonstrates strength in:

Complex visual reasoning — including visual puzzles such as Rebus, which require inferring meaning rather than reading content off the image.
OCR — text recognition and extraction from images.
Diagram understanding — parsing charts, schematics, and structured graphics.
Visual grounding — locating the specific region of an image that a phrase refers to.
3D spatial understanding — reasoning about spatial relationships within a scene.
Video comprehension — understanding temporal content, not just individual frames.
GUI control — operating graphical interfaces as an agent, which the report singles out as an area of particular strength.
Gameplay — agentic play, cited alongside GUI control as an agent-centric capability.

The technical report's own framing of the agent work is direct: "in agent-centric tasks such as GUI control and gameplay, Seed1.5-VL outperforms leading multimodal systems, including OpenAI CUA and Claude 3.7."

Technical Specifications

Vision encoder: 532M parameters
Language model: Mixture-of-Experts with 20B active parameters. Total parameter count not published.
Modalities: image, video, and text input; text output
Volcano Engine model ID: doubao-1-5-thinking-vision-pro-250428 — retired. End of Service 2026-03-31 14:00 (UTC+8). No longer callable.
Technical report: arXiv:2505.07062, published May 2025
Weights: closed. The companion GitHub repository is a cookbook under Apache-2.0 and does not host weights.
Context window: Not published by ByteDance.
Knowledge cutoff: Not published by ByteDance.
Pricing: Not published by ByteDance in English.

The model ID's 250428 suffix indicates a snapshot dated 2025-04-28; the technical report followed in May 2025.

ByteDance describes the architecture as "relatively compact" and treats that compactness as the interesting result: the report is, by its own description, "a comprehensive review of our experiences in building Seed1.5-VL across model design, data construction, and training at various stages." It is written as a methods paper as much as a model card.

Use Cases

GUI agents: driving desktop and web interfaces from screenshots — the capability the report benchmarks against OpenAI CUA and Claude 3.7.
Document intelligence: OCR plus diagram understanding on scanned documents, forms, and reports.
Visual grounding for downstream tools: producing region references that a robotics or automation stack can act on.
Video understanding: summarising, indexing, or answering questions about temporal content.
Chart and schematic analysis: extracting quantitative meaning from data visualisations and technical drawings.
Multimodal reasoning research: the compact architecture and the detailed training write-up make it a reference point for teams building their own VLMs.
Visual puzzle and reasoning evaluation: Rebus-style tasks where the answer is not present in the image as text.

Performance / Benchmarks

State of the record: ByteDance reports leadership claims and an aggregate count. It does not publish per-benchmark scores in its summary material — those live in the technical report itself.

38 out of 60 public benchmarks — state-of-the-art performance, per the technical report abstract. This is the model's headline claim.
GUI control and gameplay — "outperforms leading multimodal systems, including OpenAI CUA and Claude 3.7."
Public VLM benchmark coverage — the report states the model "delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluation suites."

We do not list individual benchmark scores here, because ByteDance's public-facing material does not carry them and we will not invent them. Readers who need the per-task numbers should read the technical report directly, where the full evaluation tables are presented.

Note that the OpenAI CUA and Claude 3.7 comparison dates from May 2025. Both comparison points have been superseded several times over; the claim describes the state of the field at publication, not today.

Limitations

It is no longer served. The Volcano Engine endpoint was taken out of service on 2026-03-31. This is the limitation that supersedes every other one below: you cannot call this model.
It is an older model. Published in May 2025, Seed1.5-VL predates the entire Seed 2.x generation. For general multimodal work, Seed 2.1 is the current flagship and handles vision natively.
No published per-benchmark scores in summary material. The "38 of 60" figure is an aggregate; the individual results require reading the paper.
Stale comparison baselines. The headline agentic comparison is against OpenAI CUA and Claude 3.7, both long since superseded.
No open weights. The Apache-2.0 GitHub repository is a cookbook, not a weight release. Self-hosting is not possible.
No published context window or pricing. ByteDance never published either, and the per-model Volcano Engine documentation page that once carried the endpoint limits now redirects to the general model list.
Volcano Engine dependency. While it was served, access required a Volcano Engine API key, oriented toward the Chinese market. That dependency is now moot: the endpoint is gone and the weights were never released, so there is no way to run the model at all.
Compact by design. A 20B active-parameter MoE trades peak capability for efficiency. On the hardest multimodal reasoning, larger frontier models will generally lead.
Content policy constraints. As a Chinese-market model, outputs are shaped by content rules that differ from those governing US-hosted models.

Pricing & Access

The Volcano Engine endpoint for Seed1.5-VL has been retired. The model ID doubao-1-5-thinking-vision-pro-250428 is no longer callable, and it no longer appears anywhere in Volcano Engine's 模型列表 ("model list").

The retirement, as ByteDance documents it

Volcano Engine publishes a 模型下线公告 ("model offline announcement"). Seed1.5-VL appears in 第六批模型下线说明 ("batch six offline notice"), whose schedule reads:

Stage	Time (UTC+8)
启动&通知 — start & notification	2025-12-19 10:00:00
EOM 模型停止新购 — end of marketing, no new provisioning	2025-12-26 10:00:00
EOS 模型服务下线 — end of service, model taken offline	2026-03-31 14:00

The batch-six table lists the model with its replacement:

原文 (original): "doubao-1-5-thinking-vision-pro-250428 | doubao-seed-2-0-lite-260215 | doubao-seed-2-0-lite-260215"

Translation: model ID | recommended migration target | system replacement if not migrated by the deadline — both columns naming doubao-seed-2-0-lite-260215.

The announcement defines the EOS milestone explicitly:

原文: "EOS：End of Service & Support，模型停止服务，模型服务下线。"

Translation: "EOS: End of Service & Support — the model stops serving; the model service is taken offline."

And describes what happens to endpoints left pointing at a retired model:

原文: "模型正式下线并残留接入点自动替换为新模型。"

Translation: "The model is formally taken offline, and residual endpoints are automatically replaced with the new model."

ByteDance frames this as routine rather than exceptional:

原文: "一般情况下，每个版本的模型生命周期为3~6个月。"

Translation: "In general, the lifecycle of each model version is 3 to 6 months."

Seed1.5-VL was retired alongside its whole generation — doubao-1-5-vision-pro-250328, doubao-1-5-vision-lite-250315, doubao-1-5-thinking-pro-250415, doubao-1-5-thinking-pro-m-250428, and doubao-1-5-ui-tars-250428 are in the same batch. No doubao-1-5-thinking-* model survives in the current list.

Vision models that do appear in the current model list

From the 视觉理解能力 ("vision understanding") section of the model list, retrieved 2026-07-08:

推荐模型 ("recommended models")

doubao-seed-2-1-pro-260628
doubao-seed-2-1-turbo-260628
doubao-seed-evolving (marked 快速迭代模型, "rapidly iterating model")

往期模型 ("previous models") — still listed, still callable

doubao-seed-2-0-pro-260215
doubao-seed-2-0-lite-260428, doubao-seed-2-0-lite-260215
doubao-seed-2-0-mini-260428, doubao-seed-2-0-mini-260215
doubao-seed-2-0-code-preview-260215
doubao-seed-character-260628

Seven further entries carry the badge 即将下线 ("to be taken offline soon"), including the two older dedicated vision SKUs doubao-seed-1-6-vision-250815 and doubao-1-5-vision-pro-32k-250115. Both are named in 第九批 ("batch nine") with an EOS of 2026-09-21 14:00 (UTC+8) and a migration target of doubao-seed-2-0-lite-260428.

Notably, doubao-seed-1-6-vision-250815 is now the only model listed under GUI 任务处理能力 ("GUI task handling") — and it, too, is scheduled for retirement.

What remains reachable

Cookbook repository — ByteDance-Seed/Seed1.5-VL on GitHub, Apache-2.0, code samples and usage guidance, not weights. Its README still tells readers the model is deployed on Volcano Engine and invites them to "try it now"; that instruction is stale. The repository has had no push since June 2025.
Hugging Face Space — ByteDance-Seed/Seed1.5-VL still exists, but the Hugging Face API reported it in a BUILD_ERROR state when checked on 2026-07-08. Treat it as unavailable rather than as a working demo.

No English-language rate card was ever published for this model, and none exists now that it is retired. For rates on current models, consult the 模型价格 ("model pricing") page.

Ecosystem & Tools

Volcano Engine (Ark) — the API platform that hosted Seed1.5-VL until its retirement on 2026-03-31, and that hosts every current Seed model.
Seed1.5-VL cookbook — official code samples and usage guide, Apache-2.0. Written against the now-retired endpoint.
Seed 2.1 (Doubao) — the current flagship, with vision built in. The migration path in practice.
doubao-seed-2-0-lite-260215 — ByteDance's designated replacement for Seed1.5-VL, per the batch-six offline notice.
Doubao Seed 2.0 Code — the dedicated coding model, which also carries multimodal perception.
Seedream 5.0 — ByteDance's image generation line, as distinct from image understanding.

Community & Resources

Seed1.5-VL Technical Report — ByteDance Seed's publication page
arXiv:2505.07062 — the full technical report, with complete evaluation tables
ByteDance-Seed/Seed1.5-VL — official cookbook repository
ByteDance Seed Models directory — canonical list of every Seed model, confirming Seed1.5-VL as the only standalone VL SKU
Seed 2.0 Official Launch — the post enumerating Pro / Lite / Mini / Code, with no Vision variant
模型下线公告 — Volcano Engine's model offline announcements; Seed1.5-VL is in batch six, EOS 2026-03-31
模型列表 — Volcano Engine's current model list, which no longer contains doubao-1-5-thinking-vision-pro-250428
Volcano Engine Doubao product page — API access and pricing

Overview

A note on "Seed 2.0 Vision"

Capabilities

Technical Specifications

Use Cases

Performance / Benchmarks

Limitations

Pricing & Access

The retirement, as ByteDance documents it

Vision models that do appear in the current model list

What remains reachable

Ecosystem & Tools

Community & Resources

Frequently Asked Questions

Was there ever a "Doubao Seed 2.0 Vision" model?

So where does vision live in the Seed family?

What is Seed1.5-VL?

How large is Seed1.5-VL?

How does Seed1.5-VL perform on benchmarks?

What is Seed1.5-VL good at?

Is the model ID `doubao-1-5-thinking-vision-pro-250428` still callable?

How do I access Seed1.5-VL?

Are the Seed1.5-VL weights open?

When was Seed1.5-VL released, and when was it retired?

Should I use Seed1.5-VL or Seed 2.1 for vision work?

Related Models

Seed 2.1 (Doubao)

Doubao Seed 2.0 Code

Seedream 5.0

Gemini 3.5

GPT-5.6

Qwen3.7-Max

Explore More Models

Seed1.5-VL

Overview

A note on "Seed 2.0 Vision"

Capabilities

Technical Specifications

Use Cases

Performance / Benchmarks

Limitations

Pricing & Access

The retirement, as ByteDance documents it

Vision models that do appear in the current model list

What remains reachable

Ecosystem & Tools

Community & Resources

Frequently Asked Questions

Was there ever a "Doubao Seed 2.0 Vision" model?

So where does vision live in the Seed family?

What is Seed1.5-VL?

How large is Seed1.5-VL?

How does Seed1.5-VL perform on benchmarks?

What is Seed1.5-VL good at?

Is the model ID doubao-1-5-thinking-vision-pro-250428 still callable?

How do I access Seed1.5-VL?

Are the Seed1.5-VL weights open?

When was Seed1.5-VL released, and when was it retired?

Should I use Seed1.5-VL or Seed 2.1 for vision work?

Related Models

Seed 2.1 (Doubao)

Doubao Seed 2.0 Code

Seedream 5.0

Gemini 3.5

GPT-5.6

Qwen3.7-Max

Explore More Models

Is the model ID `doubao-1-5-thinking-vision-pro-250428` still callable?