Happy Horse 1.1

Alibaba · AI video

Happy Horse 1.1 is Alibaba's updated video generation model. It supports three modes: text-to-video from a prompt, image-to-video from a single starting frame, and reference-to-video from up to nine reference images referenced in the prompt as [Image 1], [Image 2], and so on.

Compared with Happy Horse 1.0, version 1.1 unifies inputs around an images array, adds reference-to-video for multi-image subject and scene consistency, and exposes 720p and 1080p output tiers with flexible aspect ratios for text-driven generations.

On VidMachine, scene generation uses image-to-video at 720p with your scene starting frame and motion prompt—routed via OpenRouter at 20 credits per second.

Key features and benefits

Text-to-video, image-to-video, or reference-to-video

Provide a prompt alone for text-to-video, one image to animate as the first frame, or up to nine reference images for reference-to-video. With a single image, an optional prompt steers motion.

720p and 1080p output

Choose 720p or 1080p resolution. Image-to-video output follows the input image's aspect ratio; text-to-video and reference-to-video accept aspect_ratio presets such as 16:9, 9:16, and 1:1.

Reference image guidance

Reference-to-video keeps subjects and scenes from multiple stills. Refer to each image in the prompt as [Image 1], [Image 2], etc., for character or style consistency across the clip.

Duration

Clips run 3–15 seconds per generation. Shorter lengths often yield tighter temporal coherence.

Technical specifications

ProviderAlibaba (Happy Horse 1.1)

Primary inputsText prompt; optional images array (0 for text-to-video, 1 for image-to-video, up to 9 for reference-to-video)

Resolutions720p or 1080p

Aspect ratio16:9, 9:16, 1:1, 4:3, 3:4 for text-to-video and reference-to-video; image-to-video follows source image

Duration3–15 seconds

Use cases and applications

Use Happy Horse 1.1 for social-ready clips, motion tests from still key art, and rapid iteration when Alibaba's Happy Horse motion profile fits your creative direction.

Reference-to-video suits character-consistent shorts when you have multiple anchor stills and want them reflected in a single generated clip.

Pair image-to-video with images from any upstream generator or photography—motion quality depends on the strength of the start frame and prompt.

Why this model

Pick Happy Horse 1.1 for the latest Alibaba Happy Horse stack with reference-to-video and flexible resolution tiers.

If you need WAN 2.7-specific controls (such as structured last-frame conditioning or documented lip-sync audio lanes), compare Alibaba's WAN 2.7 image-to-video offerings side by side with Happy Horse on paper before committing.

How VidMachine uses it

VidMachine generates scene clips via image-to-video at 720p, using each scene's starting frame and motion prompt. Duration is clamped to the model's 3–15 second range. Generation is routed through OpenRouter.

Billing is 20 credits per second of output video at 720p.

Pricing · Docs

What you should know

How is Happy Horse 1.1 different from 1.0?

Version 1.1 adds reference-to-video (up to nine images), uses an images array instead of a single image field, and documents 720p and 1080p tiers with broader aspect-ratio options for text-driven modes.

Is Happy Horse 1.1 the same as WAN 2.7 I2V?

No. They are different model products in Alibaba's ecosystem with different capabilities and API surfaces. Treat naming overlap from older community posts as unreliable; read the provider's model card for each.

What resolution does VidMachine use?

Scene generation runs at 720p (20 credits per second) via OpenRouter. The API also supports 1080p when integrating directly outside VidMachine.

Does VidMachine use reference-to-video?

Not in the standard scene pipeline—VidMachine uses image-to-video with each scene's starting frame. Reference-to-video is available on the OpenRouter API for custom workflows.

← All AI models