Seedance 2.0 Prompt Guide: Master the @Tag Reference System
Most AI video models treat prompts as text instructions and reference images as separate, passive inputs. Seedance 2.0 works differently. It was built around explicit multi-reference control — you upload assets and then address them by name inside your prompt, telling the model exactly what role each file should play in the generation.
This changes how you write prompts entirely. A Seedance 2.0 prompt isn't just a description; it's closer to a director's brief that cites specific assets by reference number.
This guide covers the complete system: prompt structure, how to use each tag type, audio and lip-sync techniques, multi-shot storytelling, and ready-to-use examples.
What You Can Upload
Seedance 2.0 accepts up to 12 asset files in a single generation call:
| Asset type | Max per call | Size limit | Duration limit | Tag syntax |
|---|---|---|---|---|
| Images | 9 | 30 MB each | — | @Image1 … @Image9 |
| Video clips | 3 | 50 MB each | 2–15 seconds | @Video1 … @Video3 |
| Audio files | 3 | 15 MB each | Up to 15 seconds | @Audio1 … @Audio3 |
The tags are numbered in the order you upload the assets. Upload two images → they become @Image1 and @Image2. Upload one video → it becomes @Video1.
The Prompt Structure
Every Seedance 2.0 prompt follows the same five-part order. Deviating from it — especially putting constraints or camera instructions before the subject — causes the model to misweight the instructions.
[Subject] [Action] [Camera] [Style] [Constraints & References]
| Slot | What to write | Notes |
|---|---|---|
| Subject | Main character or object with age, clothing, material, and distinguishing details | Include @Image tag here if referencing appearance |
| Action | One primary action per shot, in present tense | Avoid stacking multiple actions — use multi-shot for sequences |
| Camera | Lens size + movement + angle + lens type | Include @Video tag here if referencing a camera movement |
| Style | One strong visual reference + lighting + color grading | Film stock names, director references, and era descriptors work well |
| Constraints & References | Negative terms, duration, rhythm, audio sync, @Audio tag | Put audio instructions and @Audio tags last |
Using the @Tag System
@Image Tags — Character and Object Appearance
The most common use of @Image tags is locking a character's appearance. Without a reference image, Seedance 2.0 will generate a plausible character from your description alone — but with one, it preserves that specific person's features, clothing, and identity across the clip.
How to use it:
- Upload the image, note its number (e.g.
@Image1) - In the Subject slot, reference the tag explicitly:
"@Image1, a woman in her 30s with short dark hair, wearing a navy blazer" - Be explicit about what the tag is for:
"Use @Image1 as the character's appearance"is clearer than just mentioning the tag
Multiple @Image tags:
You can use multiple images to define different elements. @Image1 for a character's face, @Image2 for an environment, @Image3 for a specific prop. Keep the roles distinct and name them clearly in the prompt.
@Video Tags — Camera Movement and Motion Reference
@Video tags let you hand the model a clip and say: "move the camera like this." The model reads the motion, speed, and angle from your reference video and applies it to the new generation.
How to use it:
- Upload a video clip with the camera movement you want
- Reference it in the Camera slot:
"Slow push-in following @Video1's camera path" - This works for both camera movement and character motion:
"Character moves with the rhythm and timing of @Video1"
What makes a good @Video reference:
- Short (2–6 seconds), focused clips with a single clear motion
- Clean footage without competing motion — a clean dolly shot, not a handheld crowd scene
- The motion in the reference should match the pace you want in the output
@Audio Tags — Rhythm and Lip Sync
@Audio tags drive Seedance 2.0's most distinctive capability: generating video that is rhythmically and phonemically synchronized to an audio track from the start of generation, not layered in afterward.
How to use it:
- Upload your audio file (dialogue, music, ambient sound)
- Reference it in the Constraints slot:
"Sync the character's speech to @Audio1"or"Match the editing rhythm to @Audio1" - For lip sync: the model generates phoneme-accurate mouth movements from the audio automatically — your prompt just needs to establish that the character is speaking
Lip sync prompt tips:
- Describe the character as speaking or delivering dialogue:
"speaks directly to camera"or"delivers the line with a slight smile" - Mention the emotional tone:
"low-energy delivery, slightly shy","confident, measured pacing" - Don't describe the specific words — the audio file defines those
Writing for Native Audio
Seedance 2.0 generates audio and video simultaneously through a joint architecture. This means audio descriptors in your prompt shape the generation from the first frame — not as an afterthought.
Describing audio texture: Use specific acoustic terms rather than generic adjectives.
-
❌
"nice ambient sound" -
✅
"soft reverb in a mid-size room, low ambient hum, distant traffic" -
❌
"good music" -
✅
"sparse piano melody, 80 BPM, melancholy, no percussion"
Describing environmental acoustics: Match the acoustic description to the visual environment you're generating. The model uses visual cues to infer acoustics, but explicit audio description overrides and sharpens this.
"Cathedral interior — long reverb tail, 3–4 second decay""Outdoor market — overlapping voices, dry acoustics, no reverb""Small recording booth — tight, dead acoustics, close microphone presence"
Sound effects without @Audio: You can generate specific sound effects without uploading audio by describing them:
"footsteps on wet pavement, echoing slightly""paper rustling as she turns the page, close microphone""coffee machine grinding in the background, muffled"
Multi-Shot Storytelling
Seedance 2.0 can generate coherent multi-shot sequences from a single prompt — where the model creates multiple camera cuts within one generation, maintaining character and environment consistency across all of them.
How to structure a multi-shot prompt:
Describe each shot as a numbered beat within the same prompt, keeping the subject and environment consistent:
@Image1, a young chef in white kitchen whites stands at a prep station.
Shot 1: Wide shot — she surveys ingredients laid out on the counter.
Shot 2: Close-up on her hands as she begins chopping herbs, precise and fast.
Shot 3: Medium shot — she looks up and smiles at someone off-camera.
Warm commercial kitchen lighting, golden tones. Upbeat background music that builds across the three shots. Sync atmosphere to @Audio1.
Multi-shot tips:
- Keep the subject description in one place (before the shot descriptions), not repeated in each shot
- Each shot should have a single clear action — the cut handles the transition
- Describe the audio arc across shots if using
@Audio:"music builds from quiet to energetic across all three shots"
5 Ready-to-Copy Example Prompts
1. Product Commercial (with @Image reference)
@Image1, a premium wireless headphone in matte black with gold accents, placed on a clean white surface.
The product slowly rotates 90 degrees, catching soft key light from the left. Camera orbits smoothly in a wide arc.
High-end commercial photography style, soft box lighting, minimal shadow. Elegant, precise motion. No background music. Subtle product sound — gentle click as it completes the rotation.
2. Dialogue Scene (with @Audio lip sync)
@Image1, a woman in her late 20s with auburn hair, wearing a light grey turtleneck, sits at a café window. She speaks directly to camera with a calm, slightly wistful expression.
Medium close-up, slight rack focus from background to her face. Natural window light from the left, warm interior tones. Intimate, documentary feel. Sync speech to @Audio1, natural pauses between 200–400ms, mouth movement subtle and non-exaggerated.
3. Music Video Shot (with @Audio rhythm sync)
@Image1, a dancer in a red dress stands at the center of an empty rooftop at dusk. She moves through a slow, fluid contemporary dance sequence.
Wide establishing shot that slowly pushes in to medium over the course of the clip. Cinematic, golden hour light from behind, silhouette edges glowing. Sync all movement and editing rhythm to @Audio1. Motion builds from stillness to full expression as the music peaks.
4. Environmental Atmosphere (audio-only, no image reference)
A dense pine forest at dawn. Morning mist drifts between the trees. A deer steps carefully into a small clearing and pauses, alert.
Slow, smooth tracking shot from behind the deer. Soft, diffuse dawn light, blue-grey tones warming slightly. Documentary, natural history style.
Audio: distant birdsong, soft wind through pine needles, occasional branch creak. No music. All sounds generated from scene environment.
5. Architecture Walk-Through (with @Video camera reference)
A minimalist concrete and glass house set into a hillside, surrounded by olive trees.
The camera follows @Video1's movement path through the entrance and into the main living space. Afternoon light enters through floor-to-ceiling windows and casts long shadows.
Architectural photography style, no people present. Quiet ambient sound — wind outside, slight echo in the open space. Elegant, measured pacing.
Common Mistakes
Not naming what each @tag is for
Uploading three images and then writing @Image1 @Image2 @Image3 without context leaves the model guessing. Always specify: "Use @Image1 as the character's face, @Image2 as the background environment, @Image3 as the prop she's holding."
Stacking multiple actions in one shot
"She walks in, sits down, picks up a cup, and looks out the window" is four shots, not one. Seedance 2.0 will attempt to compress all of this into a single clip and the result is rushed or incoherent. Use multi-shot structure for sequences.
Over-describing audio when an @Audio file is present
If you've uploaded @Audio1, you don't need to describe the music in detail — the model reads the file directly. Describe the relationship instead: "Match movement intensity to @Audio1" rather than re-describing the music in words.
Using high-action video clips as @Video camera references
A handheld action sequence as a @Video reference confuses the model — it can't cleanly separate camera motion from subject motion. Use clean, isolated camera movement clips for reliable references.
Vague acoustic descriptions
"good sound" and "natural audio" give the model nothing to work with. Describe the acoustic space (room size, surfaces, reverb), the dominant sounds, and any specific effects you need.
