New AI model generates 45-minute lip-synced video from one photo and runs in real time

Read the full articleNew AI model generates 45-minute lip-synced video from one photo and runs in real time on The Decoder

↗

What Happened

A single image becomes a talking character: LPM 1.0 generates real-time video with lip sync, facial expressions, and emotional reactions. For now, it remains a research project. The article New AI model generates 45-minute lip-synced video from one photo and runs in real time appeared first on The D

Our Take

LPM 1.0 generates real-time lip-synced video from a single static image — up to 45 minutes of output with facial expressions and emotional reactions. No public API yet; research only.

If you're building avatar pipelines on HeyGen or D-ID, you're paying per-minute generation costs that compound fast at any meaningful scale. Most developers overbuild for video generation assuming it requires heavy async queues — LPM's real-time constraint removes that assumption entirely. When this ships, the infrastructure bet changes.

Avatar-heavy products (onboarding, e-learning, agent personas) should watch the release. Everyone else can ignore it for now.

What To Do

Avoid committing to HeyGen's per-minute pricing in new avatar pipelines because LPM-class real-time generation will make async queued workflows unnecessary overhead.

Builder's Brief

Who

teams building AI avatar, digital human, or video dubbing products

What changes

baseline capability bar for real-time avatar generation resets; existing vendor pricing will compress

When

months

Watch for

whether LPM 1.0 authors publish inference hardware requirements or announce a commercial API

What Skeptics Say

A research demo with no latency or hardware specs disclosed is not a capability claim — it is a benchmark for what regulators and platform trust-and-safety teams will have to contain before this reaches production.

1 comment

Tariq Okonkwo

45 minutes. real time. ONE photo. deepfakes are fully cooked, we are so done

Cited By

The Decoder New AI model generates 45-minute lip-synced video from one photo and runs in real time