Wan 2.7 First & Last Frame Control: Complete Guide

Every previous AI video model gave you one anchor point: the first frame. You supplied a reference image and the model generated forward from it — but where the shot ended was entirely up to the model. For anything requiring a specific visual destination (a product reveal, a scene transition, a character arriving at a location), you were left re-generating until something close enough appeared.

Wan 2.7 changes this. You can now anchor both endpoints of a clip — the opening frame and the closing frame — and the model fills in the motion between them. This guide explains every mode, how to prompt for each, and where the technique is most useful.

The Three Modes

Wan 2.7 supports three distinct frame-anchoring approaches depending on how much control you need:

Mode	What you provide	What the model decides	Best for
First-frame only	Opening image	All motion + ending	Subject animation from a reference photo
Last-frame only	Closing image	All motion + opening	Reverse reveals, arrivals, build-ups to a final state
Dual-endpoint	Opening + closing images	Only the motion between them	Precise shot choreography, transitions, narrative arcs

How Each Mode Works

First-Frame Only

This is the original I2V mode, now improved. You supply a single reference image and Wan 2.7 generates motion outward from it. The Thinking Mode reasoning layer — new in 2.7 — makes the motion more intentional compared to earlier versions: the model plans how to develop the scene rather than generating frame-by-frame without a target.

When to use it: Animating a still photograph, creating motion from a product render, or any case where the starting composition is fixed but the ending doesn't matter.

Last-Frame Only

The inverse: you supply the final image and the model reverse-engineers a plausible opening and motion path that arrives at it. This is genuinely new behavior — earlier Wan versions had no equivalent.

When to use it:

Reveal sequences (something obscured at the start, fully visible at the end)
A character arriving at a location shown in the final frame
Environmental build-ups (an empty street that fills with people by the last frame)
Any narrative that has a known destination but a flexible journey

Dual-Endpoint (First + Last)

You provide both frames and write a prompt describing the motion between them. The model's creative freedom collapses to a single constrained problem: get from A to B convincingly. This is where you gain the most precise control over a shot.

The constraint actually helps quality — the model doesn't need to invent an ending, so it concentrates its capacity on making the transition feel physically plausible and visually smooth.

When to use it: Commercial work, scripted narratives, product presentations, any shot where both the opening composition and the closing beat are predetermined.

Writing Prompts for Frame-Controlled Generation

The key shift when using endpoint anchoring: your prompt describes the motion, not the scene. The frames already define the scene. A prompt that re-describes what's visible in the reference images wastes word budget and can conflict with what the model sees.

What to include in the prompt

Direction of movement — camera push, pull, pan, tilt, orbit; subject walking left, rising, turning
Pacing — "slowly", "abruptly", "over several seconds", "in one continuous movement"
Physical events — "fog rolls in from the right", "the petals fall one by one", "the crowd parts"
Light or color shift — "golden hour transitions to blue dusk", "the lamp flickers on"
Any secondary motion — hair movement, fabric, water ripple, leaves falling

What to leave out

Physical descriptions of the subject (already in the frame)
Background description (already in the frame)
Style or aesthetic terms (use these only if they differ noticeably from what the reference images imply)

Prompt template for dual-endpoint mode

[Camera movement or subject movement]. [Pacing]. [Any physical event that bridges the two states].
[Optional: light shift]. [Optional: secondary motion detail].

Example (opening frame: empty café at dawn; closing frame: same café full of people, midday light):

Customers slowly fill the café as morning light brightens to noon. The camera holds steady on a wide shot. Chairs scrape, steam rises from cups, the hum of conversation builds. Warm tungsten interior gradually gives way to natural daylight through the windows.

VidCella · Wan 2.7

Try first & last frame control on VidCella

Define both endpoints — let Wan 2.7 fill the motion between them

5 Practical Use Cases

1. Product Reveal

Setup: First frame — product in a closed box or obscured by shadow. Last frame — product fully lit on a clean surface.

Prompt:

The lid slowly lifts and soft studio light floods in, revealing the product beneath. Camera holds at eye level. Clean, deliberate pacing. The packaging settles as the product comes fully into view.

Why it works: The dual-endpoint constraint forces the model to stage the reveal rather than drift to an unrelated composition. You control what the viewer sees at both the climactic moment and the final resting frame.

2. Architectural Walk-In

Setup: First frame — exterior facade of a building. Last frame — interior of the same building, showing the space beyond the entrance.

Prompt:

A slow continuous push through the entrance, transitioning from the exterior facade into the interior space. Natural light from outside fades as interior ambient light takes over. Camera moves at walking pace, no cuts.

Why it works: Without last-frame anchoring this shot consistently fails — the model interprets "walk inside" as camera movement around the exterior rather than through it. Locking the interior as the final frame forces the correct spatial logic.

3. Day-to-Night Scene Transition

Setup: First frame — a city skyline in daylight. Last frame — same skyline at night with lights on.

Prompt:

The sky transitions from afternoon blue to deep navy as city lights switch on one by one. The sun dips below the horizon. Shadows lengthen. The transition takes the full duration of the clip, ending in full night.

Why it works: Time-lapse style transitions are notoriously hard to prompt reliably in T2V mode. Giving the model a concrete night-sky endpoint as the last frame removes the ambiguity entirely.

4. Character Arrival

Setup: First frame — an empty doorway or corridor. Last frame — a character standing fully in frame, facing forward.

Prompt:

A figure approaches from the far end of the corridor, growing larger as they walk toward camera. Their footsteps are the only sound. They slow to a stop in the final position. Steady camera, no movement.

Why it works: "Last-frame only" mode is ideal here — the character's final pose and position is precisely defined, while the opening and approach are generated naturally.

5. Logo or Title Reveal (Motion Graphics)

Setup: First frame — abstract shapes or a blurred/dark frame. Last frame — a clean logo or title lockup.

Prompt:

Elements assemble from the edges of the frame, converging toward center. Movement eases to a stop as the final composition locks into place. Smooth, intentional motion with a brief hold at the end.

Why it works: Motion graphics require a precise final frame. Without last-frame anchoring, the assembled elements rarely match your intended layout. Locking the final frame guarantees the correct end state while giving the model latitude on the assembly motion.

Common Mistakes

Describing the frames instead of the motion

The most frequent error: writing a prompt that re-describes what's already visible in the reference images. Example: if your first frame shows a red car parked on a street, "a red car parked on a rainy street" in the prompt adds nothing and may cause the model to over-interpret the color or weather conditions. Describe the movement instead: "The car slowly pulls away from the curb and accelerates down the street."

Choosing frames with incompatible spatial logic

First and last frames that imply physically impossible transitions (a subject teleporting, a camera angle that can't be achieved in one continuous movement) will produce unstable or warped results. Test with frames that share a believable spatial relationship.

Too short a prompt for a complex transition

Dual-endpoint mode needs enough prompt guidance to avoid the model taking the shortest (and often most boring) path between the two frames — typically a dissolve or a slow zoom. Give it explicit motion events to work with.

Using high-action reference frames for dual-endpoint

Frames that show a subject mid-motion (a jump at peak height, a wave at its crest) are difficult to use as anchors because the model can't always infer the correct direction of travel. Use frames that capture a clear, stable state at both endpoints, and describe the action in the prompt.

First & Last Frame · Live on VidCella

Control Every Shot, Start to Finish

First-frame anchoring, last-frame control, and dual-endpoint generation are all available on VidCella — no local GPU, no setup.

Pay-as-you-go credits · No subscription required