Kling Avatar V2

KwaiVGI · AI video

Kling Avatar V2 on Replicate (kwaivgi/kling-avatar-v2) turns a portrait-style image and an audio clip into a talking-head video with lip sync and facial motion. The upstream model supports optional text guidance and different quality modes; on VidMachine we always call it in standard mode ("std") for predictable cost and behavior.

This integration is available only for AI Influencer projects. We pass each scene’s starting frame image, the scene’s generated speech audio URL, and your scene video prompt as optional style guidance. Output length follows the audio; credits are 11 per second of billed scene duration (same rounding as other video models).

Kling Avatar V2 still uses the same scene start-frame workflow as the other AI Influencer video models. Generate the scene starting frame first, then use Kling to animate that specific image with scene speech.

Key features and benefits

Scene frame + speech driven

You provide a scene-specific starting frame and clean scene audio from ElevenLabs. The model preserves the appearance from that frame while syncing mouth and expression to the speech.

Optional prompt

The scene video prompt is sent when present to nudge tone, energy, or camera feel. Audio remains the primary driver for timing and lip sync.

Standard mode only

VidMachine pins mode to std so pricing and quality stay consistent. Pro tier is not exposed in the product.

Technical specifications

ProviderKwaiVGI (via Replicate kwaivgi/kling-avatar-v2)

VidMachine project typeAI Influencer only

InputsScene starting frame image URL; scene audio URL; optional text prompt

Modestd (fixed)

Credits11 credits per second of target scene duration

Use cases and applications

Consistent presenter-style clips where each scene's generated starting frame becomes the talking avatar source image.

Fast talking-head variants when you already have per-scene dialogue audio and generated scene frames.

Why this model

Choose Kling Avatar V2 when you want a dedicated talking-avatar path from a scene starting frame plus audio rather than general-purpose scene motion via WAN or Seedance.

Prefer Seedance or WAN when your creative depends on a generated starting frame or richer scene motion beyond a talking head.

How VidMachine uses it

Select Kling Avatar V2 in video model priority for an AI Influencer project. For each scene we call Replicate with mode std, that scene's starting frame as image, scene speech URL as audio, and the scene video prompt when set.

Scene target duration is aligned with influencer lip-sync limits (2–15s) like other influencer models; credits use the Kling rate in lib/credits/calculations.js.

Pricing · Docs

What you should know

Does Kling need a start frame?

Yes. On VidMachine, Kling Avatar V2 uses the current scene's starting frame image plus scene speech audio.

Can I use Kling on regular AI Video projects?

No. It is only selectable for ai_influencer projects so general video pipelines keep their start-frame workflow.

How are credits calculated?

Video generation bills credits per second of the scene target duration at 11 credits per second for this model, rounded up like other per-second video charges.

← All AI models