VidMachine

VidMachine

Alibaba

WAN 2.7 Image

Alibaba · AI image
WAN 2.7 Image is Alibaba's Wan 2.7 image model for text-to-image, image editing, and multi-reference fusion. It supports resolutions up to about 2K (for example 2048×2048), flexible aspect ratios via preset sizes or custom dimensions, and optional coherent image-set generation for the same character or product across multiple shots. A thinking mode is available for stronger reasoning on text-to-image requests (as described on the model provider page).
On VidMachine, WAN 2.7 Image is available as an image option in your project's image model priority. We run it through Replicate as documented at replicate.com/wan-video/wan-2.7-image. Each generated image uses a fixed credit cost (see Pricing). You can use it for start frames, scene edits, thumbnails, and AI Images workflows alongside other models in your fallback chain.
For higher quality output with 4K support and a dedicated “pro” variant, Alibaba also documents WAN 2.7 Image Pro on Replicate; on VidMachine the integrated model is the standard WAN 2.7 Image variant.

Key features and benefits

Text-to-image

Describe the scene in natural language and get a high-quality image. Prompts can be long (the upstream model supports up to thousands of characters), so you can specify composition, lighting, style, and subject detail in one go. Size can be set with presets such as 1K or 2K, or with explicit width-by-height strings for exact aspect ratios—useful for vertical shorts, landscape thumbnails, or square assets.

Image editing and multi-reference fusion

You can supply reference images together with a text instruction to edit, restyle, or fuse content. The model accepts multiple inputs for style transfer, element swaps, and blending several references into one output. This fits workflows where you already have a product shot, a character reference, or a mood board and want a new composite or variant without starting from a blank canvas.

Image sets and consistency

Image-set mode is designed for coherent batches from one prompt—for example the same character in different seasons, product angles, or storyboard beats. The provider documents generating multiple related images in one request. On VidMachine we currently call the model for a single output per generation step, but you can still rely on the model's strength for consistency when you use it repeatedly with clear prompts and references.

Thinking mode and reproducibility

Thinking mode is aimed at improved quality for text-to-image by doing more internal reasoning before rendering. A seed can be used when the API supports it for reproducible runs. Together, these help when you need to iterate on small prompt tweaks or lock down a look for a series of images.

Technical specifications

ProviderAlibaba Wan (via Replicate)
ModesText-to-image; image edit / fusion; optional image sets
Reference imagesUp to 9 (when using image inputs)
ResolutionUp to ~2K; 1K / 2K presets or custom WxH
Outputs per call (upstream)1–4 (or more in image-set mode)
VidMachine integrationSingle image per call; portrait-friendly defaults

Use cases and applications

WAN 2.7 Image suits social and marketing content where you need sharp 2K-class imagery, vertical or custom aspect ratios, and optional reference-driven edits. Use it for Shorts/TikTok start frames, stylized thumbnails, storyboard stills, and e-commerce or brand visuals when you want to fuse or restyle existing shots.
When you maintain a reference image in the project or edit an existing scene frame, the model can treat that image as input for edits—similar to other multi-modal image APIs. Multi-reference fusion helps campaigns that need the same product in several environments or the same character with consistent identity across frames.
On VidMachine, add WAN 2.7 Image to your image model priority as primary or fallback. It sits at a moderate per-image credit cost relative to some premium options, so it is a practical choice when you want Alibaba's latest image stack without always paying the top tier.

Why this model

Choose WAN 2.7 Image when you want Alibaba's current general-purpose image stack with strong support for both pure text-to-image and reference-heavy edits, up to roughly 2K resolution. It is a good fit if you already use WAN 2.5 for video on VidMachine and want a related image option in the same ecosystem.
If you need maximum text-in-image fidelity or 4K stills, compare with Seedream 4.5 or Flux 2 Pro on VidMachine. If you want the lowest cost per image for photorealistic stills, Grok Image or Nano Banana may be enough. WAN 2.7 Image targets the middle ground: flexible sizing, multi-image workflows, and modern Wan 2.7 quality at a clear per-image price.

How VidMachine uses it

Select WAN 2.7 Image in your project's image model priority on VidMachine. We call the Replicate model wan-video/wan-2.7-image with your prompt, optional reference URLs (up to nine when the pipeline supplies them), and a size appropriate to your aspect ratio (for example vertical shorts). Each successful image generation deducts credits according to Pricing.
For AI Video, generated images feed start frames and scene imagery; for AI Images projects, images are produced the same way. You can stack WAN 2.7 Image with other models in priority order so that if one provider is slow or errors, the next model is tried automatically.

What you should know

What is the difference between WAN 2.7 Image and WAN 2.7 Image Pro?
The Pro variant on Replicate emphasizes higher quality, 4K support, and additional options. VidMachine integrates the standard WAN 2.7 Image model at replicate.com/wan-video/wan-2.7-image.
How many reference images can I use?
The upstream model documents up to nine images for editing, style transfer, or fusion. VidMachine passes through references when your workflow provides them, capped to that limit.
How are WAN 2.7 Image credits charged on VidMachine?
Each generated image uses a fixed number of credits per image. See the Pricing page for the current rate (WAN 2.7 Image is priced per image, not per second).
Does VidMachine use image-set mode for multiple images at once?
We currently request a single output per generation call for pipeline simplicity. Image-set mode remains a capability of the underlying model for future product use.