How to Use Grok Imagine 1.5: Setup, Prompts, and Sound
Grok Imagine 1.5 is fast and cheap, which means you get to be wrong a lot without it costing much. The catch is that it rewards a specific way of writing prompts, and most people write them like they're prompting a chatbot. They're not. They're directing a 6-second shot with sound.
This guide covers how to get into the model, the prompt structure that actually works, and the one section everybody forgets — the audio.
How to Use Grok Imagine 1.5 (Quick Answer)
- Open Grok Imagine via the X app, the Grok mobile app, the xAI API, or an aggregator like VidCella.
- Upload or generate a still image — it becomes the first frame of your video.
- Tap Make a video and pick your mode (Fast for drafts, standard for quality).
- Write a short prompt: subject, a strong-verb action, an explicit camera move.
- Add a Sound: line describing audio, and put any spoken dialogue in quotes.
- Generate, then extend or regenerate from the result.
What You Need First
Grok Imagine 1.5 is image-to-video at its core: it animates a still as the opening frame. So before any prompt, you need a starting image — either one you upload or one you generate in Grok's image mode.
Access paths, and what each costs:
| Path | What you get | Cost |
|---|---|---|
| X app / Grok mobile app | Consumer UI, Spicy/Fun/Custom modes | Subscription tier ($8–$300/mo) |
| xAI API | Programmatic access, model id grok-imagine-video-1.5 | ~$4.20 / min of 720p+audio |
| VidCella | Hosted generator, no account juggling | Pay-as-you-go credits |
xAI closed the free video tier in March 2026, so direct generation through X now needs a paid plan. If you're just testing 1.5 and don't want a $30 SuperGrok month sitting on your card, run Grok Imagine on VidCella by the credit instead. Either way, the prompt mechanics below are identical.
The Prompt Formula That Works
Grok 1.5 likes short and front-loaded. Aim for 30 to 60 words. Longer than that and the model starts dropping instructions; shorter and it improvises the parts you cared about.
Put the most important thing first, because the first few words carry the most weight:
[Subject] + [strong-verb action] + [explicit camera move]
Sound: [ambience, effects, music]
"[any spoken dialogue, in quotes]"
Three rules do most of the work:
- Front-load the subject and action. "A surfer carves down a collapsing wave" before any styling. The model commits to whatever you open with.
- Use a strong verb, not a generic one. "Shatters," "surges," "lunges," "erupts" produce visibly more motion than "moves" or "goes." Weak verbs give you a near-static clip.
- Name an explicit camera move. Dolly in, orbit left, crane up, whip pan, push-in. If you don't name one, Grok picks for you, and it usually picks "barely moving."
Why the Sound line matters
This is the part that separates a 1.5 prompt from a 1.0 habit. Grok Imagine 1.5 generates audio in the same pass as the picture, so if you don't describe sound, the model guesses — and its guesses are generic. An explicit Sound: line is the difference between a clip you can use and a clip you have to score later.
Write the audio the way you'd write a sound design note. For dialogue, put the exact words in quotes and the model lip-syncs to them:
Sound: heavy rain on a tin roof, distant thunder, a low cello drone
"You shouldn't have come back."
The model handles speech and impact-linked sound well. Layered orchestral mixing is where it's weaker, so keep the audio description focused on one or two elements rather than a full soundscape.
Step-by-Step: From Image to Finished Clip
Step 1: Get your first frame
Upload a still or generate one in Grok's image mode. Composition here decides everything — the video inherits this frame's framing, lighting, and subject placement. A weak starting image can't be rescued by a good prompt.
Step 2: Open the video mode
Tap Make a video on the image. Pick Fast mode for iteration (a 6-second 720p clip comes back in roughly 25 seconds) and the standard mode when you've locked the prompt and want the cleaner render.
Step 3: Write the prompt with a Sound line
Apply the formula above. Keep it to two or three sentences plus the Sound: block. Resist the urge to over-describe what's already visible in your first frame — the image has the subject covered, so spend your words on motion, camera, and audio.
Step 4: Generate and judge in motion
Watch for the two things 1.5 still gets wrong: physics glitches on fast motion (limbs warping, objects passing through each other) and lip-sync drift on longer dialogue. If either shows up, shorten the clip or simplify the action.
Step 5: Extend instead of restarting
If a clip is close, use the continuation feature to extend from its final frame rather than regenerating from scratch. It preserves motion, lighting, and character position, so you keep what worked instead of rolling the dice again.
Is Grok Imagine 1.5 Free?
No, not for video. xAI closed the free video tier in March 2026, so generating clips through X or the Grok app requires a paid subscription, starting at $8/month for X Premium and going up to $30 for SuperGrok. A limited free image tier still exists. If you want video without a monthly commitment, an aggregator that bills per generation is the cheaper route for low-volume use. We broke down all the access options in Grok Imagine is no longer free.
Why Is My Output Blurred or "Moderated"?
If you're hitting blurred frames or a "content moderated" message, it's almost always one of three things: a content filter triggered by your prompt or image, a regional law block (UK and several EU countries restrict certain outputs), or the tightened moderation xAI rolled out in early 2026 under legal pressure. Photoreal content involving real people gets hit hardest. Rephrasing the prompt or changing the source image usually clears a false trigger; a genuine policy block won't budge.
How Does It Compare to the Last Version?
The short version: 1.5 is faster, adds usable one-pass audio, and is steadier in motion, but it kept the 720p ceiling. If you want the full breakdown of what changed and whether the upgrade earns its place in a workflow, read our Grok Imagine 1.5 review.
Copy-Ready Example Prompts
1. Action / impact
A boxer slams a heavy bag, sweat spraying off the leather. Orbit left around him, low angle. Sound: dull thuds, ragged breathing, a gym's distant hum.
2. Dialogue close-up
A detective leans into the lamplight, eyes narrowing. Slow push-in to a tight close-up. Sound: rain on a window, a ticking clock. "We both know you were there."
3. Nature / motion
A wave surges and shatters against black rocks, spray exploding upward. Crane up and back to reveal the empty coastline. Sound: crashing surf, gulls, wind.
4. Product (clean motion)
A sneaker rotates slowly on a pedestal, studio light raking across the texture. Smooth 360 orbit. Sound: a soft synth pad, one subtle whoosh on each rotation.
5. Character moment
A girl blows out a single candle, smoke curling upward in slow motion. Gentle dolly in. Sound: a quiet breath, a faint music-box melody.
Notice the pattern in all five: subject and a strong verb up front, one named camera move, a focused Sound: line, dialogue in quotes only when it's needed.
Common Mistakes and How to Fix Them
Mistake: No camera move named
- ❌ "A car on a highway at night"
- ✅ "A car tears down a wet highway at night. Low chase shot, dolly behind it."
Mistake: Weak verbs
- ❌ "A dragon moves over the castle"
- ✅ "A dragon dives over the castle, wings snapping the air"
Mistake: Skipping the Sound line
- The model fills silence with generic audio. Even one line — "Sound: wind and distant traffic" — gives you control over what other people leave to chance.
Mistake: Over-describing the first frame
- Your image already locks the subject. Re-describing its physical details wastes your 30–60 word budget and can fight the reference.
Mistake: Asking for one long take
- 1.5 holds up best on short clips. For anything longer, generate in segments and use Extend to chain them, rather than prompting a single 15-second shot and hoping the back half survives.
Where to Go From Here
Grok Imagine 1.5 is a draft engine: fast, cheap, loud enough to be useful, capped at 720p. Lean on it for ideation and short social clips, then move keepers to a higher-resolution model when the work has to ship. For the full quality-and-price verdict, the 1.5 review covers when it's worth it. For the wider field of models worth knowing, the April 2026 roundup lays them side by side.
