Wan2.1 is Alibaba's open-weights video generation model family — and it's a significant one. Alibaba published the weights on Hugging Face on 25 February 2025 under Apache 2.0, making a genuinely capable video model runnable locally, free of charge, by anyone with sufficient GPU hardware.

Overview

Developed by Alibaba's Tongyi research team, Wan2.1 was released as a fully open-source model under the Apache 2.0 license — meaning anyone can download the weights, run them locally, fine-tune them, and use them commercially. This positioned Wan2.1 as the "DeepSeek moment for video" — a Chinese open-source model that matched or exceeded proprietary Western alternatives.

Wan2.1 has since been superseded. Alibaba released Wan2.2 in July 2025 — also Apache 2.0, also on the Wan-Video GitHub org and the Wan-AI Hugging Face org — with Wan2.2-T2V-A14B, Wan2.2-I2V-A14B and Wan2.2-TI2V-5B, followed by Wan2.2-S2V-14B (speech-to-video, August 2025) and Wan2.2-Animate-14B (September 2025). If you are starting fresh today, Wan2.2 is the current open-weights generation; Wan2.1 remains widely used because of the large body of community LoRAs and ComfyUI workflows built on it.

Later Wan releases (Wan 2.5 and beyond) have been offered through Alibaba Cloud's API only — no weights for them appear in the Wan-Video GitHub org or the Wan-AI Hugging Face org.

Key Features

Open-Source Apache 2.0: Full model weights released publicly — free for commercial and research use with no licensing fees.
Text-to-Video (14B): Generates 480p/720p video from text prompts, with strong understanding of complex prompt details and physical dynamics.
Image-to-Video (14B): Animates still images with physics-aware motion and strong adherence to the source image composition.
Multi-Resolution Support: Generates video at various resolutions and aspect ratios (9:16, 16:9, 1:1).
ComfyUI Integration: Full workflow support in ComfyUI, enabling complex multi-step video generation pipelines.
Competitive Motion Quality: In academic benchmarks, Wan2.1 scores comparably to commercial models on motion smoothness, physical plausibility, and scene consistency.
Model Variants: 1.3B (lightweight, runs on 8GB VRAM) and 14B (full quality, requires 24-80GB VRAM depending on quantization).

Model Variants

Model	Parameters	Min VRAM	Quality	Best For
Wan2.1-T2V-1.3B	1.3B	8GB	Good	Quick generation, low-end hardware
Wan2.1-T2V-14B	14B	40GB (bf16) / 24GB (8-bit)	High	Quality generation
Wan2.1-I2V-14B	14B	40GB (bf16) / 24GB (8-bit)	High	Image animation

Use Cases

Open-Source Video Pipeline Development

Build custom video generation workflows without API costs or vendor lock-in.
Fine-tune on domain-specific data (product videos, architectural renderings, specific styles).

Research & Experimentation

Academic research on video generation and motion modeling with full model access.
Testing and benchmarking against other open models.

Enterprise with Data Sovereignty

Run video generation on-premise for cases where content cannot be sent to cloud APIs.
Integrate into proprietary creative pipelines without per-video API fees.

Cost-Efficient Scale Generation

After initial hardware investment, generate unlimited video at zero marginal cost.
Batch processing of large video generation jobs overnight on GPU clusters.

Getting Started

Option A: Via Alibaba Cloud Bailian (Easiest — API)

Sign up for Alibaba Cloud.
Access the Bailian model gallery.
Enable the Wan2.1 endpoint and obtain an API key.
Use the REST API for generation.

Option B: Run Locally via ComfyUI (Advanced)

Requirements: NVIDIA GPU with 24GB+ VRAM (RTX 3090/4090/A6000) or 40GB+ for bf16.

# Install ComfyUI:
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt

# Download the ComfyUI-WanVideoWrapper custom node:
cd custom_nodes
git clone https://github.com/kijai/ComfyUI-WanVideoWrapper

# Download Wan2.1 model weights (14B T2V, ~30GB):
# Visit: https://huggingface.co/Wan-AI/Wan2.1-T2V-14B
# Download model shards to ComfyUI/models/wan/

Option C: ModelScope (Cloud Demo)

Visit modelscope.cn/models/Wan-AI.
Use the online inference demo (free, limited quota).
No setup required.

Running via ComfyUI Workflow

Launch ComfyUI: python main.py --lowvram (for 24GB cards).
Load a Wan2.1 workflow from the ComfyUI Manager or community.
Set your text prompt and click "Queue Prompt."
Generation of a 5-second clip takes 5-20 minutes depending on your GPU.

Prompt Tips for Wan2.1

Be cinematic: "Tracking shot of a woman walking through autumn forest, golden hour, depth of field."
Describe physics: "Waves crashing against rocks, slow motion, high contrast water spray."
Specify camera: "Static camera, wide angle, establishing shot."
Keep complexity moderate: Wan2.1 handles single-subject scenes more reliably than complex multi-person scenarios.

Pricing & Access

Model Weights: Free — download from Hugging Face or ModelScope (Apache 2.0).
Local Inference: Free — you pay only your hardware/electricity costs.
Alibaba Cloud Bailian (Managed API): Credit-based pricing (volume discounts for enterprise).
ModelScope Demo: Free with limited daily quota.

Limitations

Hardware Requirements: The 14B model requires significant GPU VRAM (24-80GB). Not suitable for consumer laptops.
Generation Speed: Local generation on a single RTX 4090 takes 5-20 minutes per 5-second clip — much slower than cloud APIs.
Setup Complexity: Running locally via ComfyUI requires technical knowledge and significant initial setup time.
Resolution Ceiling: The published Wan2.1 I2V checkpoints target 480P and 720P.
Superseded: Wan2.2 (July 2025) is the newer open-weights generation, and later Wan releases are API-only.

Community & Support

Hugging Face: huggingface.co/Wan-AI — model weights and documentation.
GitHub: github.com/Wan-Video/Wan2.1 — official code repository.
ModelScope: modelscope.cn — online demos and Chinese community.
Reddit: r/StableDiffusion — active discussion on Wan2.1 workflows.
ComfyUI Discord: Active community for ComfyUI + Wan2.1 workflow help.

Wan2.1 (Alibaba)