Wan 2.7 vs Wan 2.6: What Actually Changed

Wan 2.6 shipped December 16, 2025 and raised the bar for AI video quality significantly — 1080P at 24fps, up to 15 seconds, native lip-sync, and the Reference-to-Video (R2V) mode that could recreate a character's appearance and voice from a reference clip. For three months it was arguably the best closed-source video model from the Wan series.

Wan 2.7 arrived April 3, 2026. This post breaks down exactly what is new, what improved, and whether the upgrade is worth your attention.

A Note on Open Source

Before comparing versions, context matters: Wan 2.1 and Wan 2.2 were the last models in the Wan series to release weights publicly (Apache 2.0 license, available on GitHub and Hugging Face). From Wan 2.5 onward, Alibaba shifted to a commercial API model — weights are no longer distributed. If you run ComfyUI locally or need to self-host, Wan 2.2 remains the most capable option you can actually download.

Wan 2.6 and 2.7 are only accessible through hosted services and APIs.

Feature-by-Feature Comparison

Feature	Wan 2.6	Wan 2.7
Max resolution	1080P	1080P
Max duration	15s	15s
Frame rate	24fps	16fps default
First-frame I2V	✅	✅
Last-frame control	❌	✅
9-grid image input	❌	✅
Natural language video editing	❌	✅
Multi-shot narrative	✅	✅ (improved)
Reference video input	✅ (1–2 refs)	✅ (up to 5 refs)
R2V (appearance + voice)	✅	✅
Native audio-visual sync	✅	✅ (improved)
Thinking Mode	❌	✅
Aspect ratios	16:9, 9:16, 1:1	16:9, 9:16, 1:1

What's Genuinely New in Wan 2.7

Last-Frame Control

In 2.6 you could anchor the first frame and let the model generate forward. In 2.7 you anchor both endpoints — the opening and closing frame — and the model fills in the motion between them. This is the single most impactful addition for anyone doing narrative or commercial work: you control where the shot starts and where it ends, not just where it starts.

9-Grid Image Input for I2V

Instead of a single reference image, Wan 2.7 accepts a 3×3 grid of nine images in a single I2V call. The model reads across all nine to infer the subject's appearance, the environment, and the intended composition. The practical benefit: dramatically less subject drift across longer or more complex shots compared to single-image I2V.

Natural Language Video Editing

This is a new paradigm that didn't exist in 2.6. Pass an existing video alongside an instruction like "change the background to a rainy street" or "swap the jacket to dark red" and Wan 2.7 returns an edited version — without regenerating from scratch. Iteration cycles that previously required a full new generation are now handled as lightweight edits.

Up to 5 Simultaneous Video References

Wan 2.6 supported reference video input for R2V (one character, one voice). Wan 2.7 scales this to five reference videos at once, with the model reading character appearance, motion style, and environment context across all of them. Multi-character scenes and complex set recreations become significantly more controllable.

Thinking Mode

Wan 2.7 adds a chain-of-thought reasoning layer that plans composition, framing, and motion before the generation pass begins. The result is more intentional outputs on complex prompts — the model is less likely to misinterpret ambiguous instructions when it has explicitly "planned" the scene first.

What Improved (Not Just Added)

Character consistency was the most common complaint about Wan 2.5 and 2.6 — faces and clothing would drift noticeably mid-clip. Wan 2.7 addresses this directly: identity tracking across frames is more stable, clothing details hold, and fast movement no longer causes the artifacts that plagued earlier versions.

Audio-visual synchronization was already good in 2.6. In 2.7 it's tighter — background music, ambient sound, and character vocals are generated as part of the scene from the first frame rather than synchronized in post. The improvement is most noticeable in dialogue-heavy shots.

What Didn't Change

Maximum resolution stays at 1080P
Maximum duration stays at 15 seconds
Aspect ratio support is identical
Neither version releases model weights — both are API-only

If you were expecting a resolution or duration jump, this is not that release. The Wan 2.7 upgrade is about control, consistency, and editability, not raw output specs.

VidCella · Wan 2.7

Try Wan 2.7 without installing anything

First-frame, last-frame control · 9-grid I2V · No subscription

Should You Upgrade?

Use Wan 2.7 if you:

Need to define both the start and end frame of a shot
Work with multi-character scenes that require consistent appearance across clips
Iterate on existing videos rather than regenerating from scratch
Want more reliable results from complex or ambiguous prompts

Stick with Wan 2.6 if you:

Only need single-image I2V with a fixed starting frame
Have a pipeline already tuned around 2.6 and don't need the new control inputs
Are comparing API costs and the editing features aren't relevant to your use case

Use Wan 2.2 if you:

Need to run the model locally on your own hardware
Require open weights for fine-tuning or deployment in a private environment
Want the most capable model you can actually download

The Broader Trend

Each version since 2.2 has added control rather than raw power. Wan 2.5 introduced better audio. Wan 2.6 added reference-based character recreation. Wan 2.7 adds endpoint anchoring, multi-reference grids, and in-place editing. The direction is clear: Alibaba is building a video production toolkit, not just a video generator — but only for users willing to use their hosted API.

For the open-source community, Wan 2.2 remains the ceiling. Whether that changes with future releases is an open question.

Wan 2.7 · All new features available

Generate Your First Wan 2.7 Video

Every feature covered in this comparison is live on VidCella — first-frame anchoring, last-frame control, 9-grid I2V, and more. No local GPU required.

Pay-as-you-go credits · No subscription required