Generate AI Videos with Happyhorse 1.0 on VidCella

Text, image, reference, edit & extend · Native audio-video · 720p / 1080p · 3–15 seconds

Happyhorse 1.0 Features on VidCella

Why Happyhorse 1.0 sits at the top of the Artificial Analysis Video Arena, and what you get when you generate with it on VidCella:

Native Audio-Video Generation

Audio and video generated in one forward pass through a 15B single-stream Transformer. Sound arrives synchronized to the frame instead of warped into post-hoc alignment like cascade systems.

Top of the Artificial Analysis Leaderboard

Entered the Video Arena anonymously and took #1 in both T2V (Elo 1,347) and I2V (Elo 1,406). The 74-point margin over Seedance 2.0 is the largest gap ever recorded on that benchmark.

Industry-Leading Lip Sync

Word Error Rate of 14.60% across seven languages (Mandarin, Cantonese, English, Japanese, Korean, German, French). Phoneme and mouth-shape tokens share the same embedding space — accuracy doesn't depend on post-hoc warping.

Five Video Creation Modes

Create from text, a first-frame image, one or more visual references, an existing clip to edit, or a source video to extend. Normal T2V supports five aspect ratios; image, reference, edit, and extend modes inherit the source media shape.

720p / 1080p With Mode-Aware Billing

720p (40 credits/s) or 1080p (80 credits/s). Text, image, reference, and extend modes use the selected 3 to 15 second duration; video edit is billed from the uploaded clip duration with a 3 second minimum.

Built by Alibaba's ATH-AI Lab

From Alibaba's Future Life Laboratory (ATH-AI), led by Zhang Di — formerly a VP at Kuaishou and a principal architect on Kling AI. The first product in Alibaba's "Happy Universe" line, followed nine days later by the Happy Oyster 3D world model.

FAQ

Happyhorse 1.0 Frequently Asked Questions

What you need to know before generating with Happyhorse 1.0 on VidCella:

What is Happyhorse 1.0?

Alibaba's flagship AI video model, released April 2026. On VidCella it supports text-to-video, image-to-video, reference-to-video, video edit, and video extend, all powered by a 15B single-stream Transformer that generates audio and video in one pass.

Who built Happyhorse 1.0, and why was it launched anonymously?

Built by Alibaba's ATH-AI lab (Future Life Laboratory, Taobao and Tmall Group), led by Zhang Di — formerly a VP at Kuaishou and a principal architect on Kling AI. Launched anonymously so users would judge it on the footage, not the brand. By the time origins were traced, it was already #1.

What makes Happyhorse 1.0's native audio different?

Most AI video systems are cascade — video model first, then a separate audio model, then alignment. Happyhorse does it in one forward pass: text, image, video, and audio tokens all enter the same 15B Transformer. Phonemes and mouth shapes are learned jointly, which is why its WER (14.60%) is the lowest in the category.

How does Happyhorse 1.0 compare to Seedance 2.0?

Happyhorse leads on visual quality and speech-heavy lip sync, and now covers reference, edit, and extend workflows on VidCella. Seedance 2.0 still wins on 4K output, longer shots, richer ambient audio, and broader @tag multi-asset control. Pick Happyhorse for short cinematic clips and quick edits; pick Seedance for longer 4K work.

What resolutions and durations does Happyhorse 1.0 support?

720p or 1080p. Text, image, reference, and extend modes support 3 to 15 seconds. Video edit follows the uploaded clip duration for billing. T2V offers five aspect ratios (16:9, 9:16, 1:1, 4:3, 3:4); media-based modes inherit the source aspect ratio.

How much does Happyhorse 1.0 cost on VidCella?

40 credits per second at 720p, 80 credits per second at 1080p. A 5-second 720p clip is 200 credits; 5-second 1080p is 400 credits. Video edit uses the server-verified input duration with a 3-second minimum. No subscription — credits come from your normal VidCella balance.