What Makes Wan 2.7 Stand Out
Wan 2.7 introduces four major capabilities that redefine AI video generation. These are not incremental upgrades — they are new paradigms that give you unprecedented control over your creative output.
First & Last Frame Control
Natural Language Video Editing
Subject & Voice Reference
What is Wan 2.7?
Wan 2.7 is Alibaba's latest AI video generation model, released April 3, 2026. It builds on the Wan series with a focus on control, consistency, and editability — letting creators direct AI video generation with a level of precision that previous models could not achieve.
- Text-to-Video with 5 Aspect RatiosGenerate 2–15 second videos from text prompts in 16:9, 9:16, 1:1, 4:3, or 3:4. Describe your scene in detail — subject, action, camera movement, lighting, style — and Wan 2.7 produces cinematic results at 720P or 1080P resolution.
- Image-to-Video with First & Last FrameConvert images to video with endpoint control. Upload a first frame, a last frame, or both, and the model generates the motion in between. The 9-grid input mode accepts a 3x3 grid of nine reference images for dramatically better subject consistency.
- Video Editing & ExtensionEdit existing videos with natural language instructions — change backgrounds, swap outfits, alter lighting — without full regeneration. Extend videos beyond their original duration with prompt-guided continuation that maintains visual coherence.
- Audio Sync & Voice ReferenceNative audio-visual synchronization generates music, ambient sound, and vocals as part of the scene from the first frame. Subject & voice reference lets you upload a character image and voice sample to produce talking videos with consistent identity and synchronized lip movements.
How to Use Wan 2.7 on VidCella
Create AI videos with Wan 2.7 in four steps. Whether you use text-to-video, image-to-video, or video editing mode, VidCella gives you full access to Wan 2.7's capabilities:
Wan 2.7 Features on VidCella
Explore the full range of Wan 2.7 capabilities available on VidCella — from text-to-video generation to advanced editing and extension workflows:
Text-to-Video
Generate 2–15 second videos from text prompts at 720P or 1080P. Five aspect ratios (16:9, 9:16, 1:1, 4:3, 3:4) cover every use case from landscape cinema to vertical social content.
Image-to-Video
Animate still images with first-frame, last-frame, or dual-endpoint control. The 9-grid input mode accepts nine reference images for superior subject consistency across the generated video.
Video Editing
Edit existing videos with natural language instructions. Change backgrounds, swap clothing, alter lighting, or modify any visual element — without regenerating the entire clip from scratch.
Video Extend
Extend videos beyond their original duration. Provide a prompt to guide the continuation, maintaining visual and narrative coherence with the source material.
Audio Synchronization
Native audio-visual sync generates background music, ambient sound, and character vocals as part of the scene. Audio is produced alongside the video, not layered in post-processing.
Up to 1080P Resolution
Generate at 720P (30 credits/s) or 1080P (45 credits/s). Both resolutions support all workflows — text-to-video, image-to-video, video editing, and video extension.
Wan 2.7 Frequently Asked Questions
Everything you need to know about using Wan 2.7 on VidCella:
What is Wan 2.7?
Wan 2.7 is Alibaba's latest AI video generation model, released April 3, 2026. It supports text-to-video, image-to-video with first & last frame control, natural language video editing, video extension, and audio synchronization — all at up to 1080P resolution and 15 seconds duration.
What is first and last frame control?
First and last frame control lets you anchor both the opening and closing frames of your video. Upload a starting image, an ending image, or both, and Wan 2.7 generates the motion between them. This gives you precise control over narrative arcs, product reveals, transitions, and any shot where the start and end states matter.
What is subject & voice reference?
Subject & voice reference lets you upload a character image and a short voice audio clip together. Wan 2.7 generates a video where the character's appearance matches the reference image while lip movements and facial expressions synchronize to the provided voice — all in a single generation pass, without post-processing or separate dubbing.
How does natural language video editing work?
Upload an existing video and describe the changes you want in plain text — for example, "change the background to a sunset beach" or "swap the red shirt to blue." Wan 2.7 applies the edits to the video without regenerating it from scratch, preserving the original motion and composition while applying your changes.
What resolutions and durations does Wan 2.7 support?
Wan 2.7 generates video at 720P or 1080P resolution, with durations from 2 to 15 seconds. Five aspect ratios are available: 16:9 (landscape), 9:16 (portrait), 1:1 (square), 4:3, and 3:4. Video extension mode supports 5–15 second extensions.
How much does Wan 2.7 cost on VidCella?
Wan 2.7 costs 30 credits per second at 720P and 45 credits per second at 1080P. For example, a 5-second 720P video costs 150 credits, and a 5-second 1080P video costs 225 credits. No subscription required — pay only for what you generate.
What is 9-grid image input?
Instead of providing a single reference image for image-to-video, you can upload a 3x3 grid of nine images in a single call. Wan 2.7 reads across all nine to infer the subject's appearance, environment, and composition — dramatically reducing subject drift compared to single-image input.
Is Wan 2.7 open source?
No. Wan 2.1 and Wan 2.2 were the last models in the Wan series to release weights publicly (Apache 2.0). From Wan 2.5 onward, Alibaba shifted to a commercial API model. Wan 2.7 is only accessible through hosted platforms like VidCella. If you need to self-host, Wan 2.2 remains the most capable open-source option.
