Skip to main content
LPM 1.0 AI generates a 45-minute lip-synced video from a single photo, showcasing real-time facial animation.

Editorial illustration for LPM 1.0 creates 45‑minute lip‑synced video from a single photo in real time

LPM 1.0: AI Turns Single Photo into 45-Min Video

LPM 1.0 creates 45‑minute lip‑synced video from a single photo in real time

2 min read

A single portrait can now become a half‑hour of moving speech without a studio or a time‑consuming render farm. Researchers unveiled LPM 1.0, a model that takes one still image and produces a continuously streaming video that stays in sync with a spoken script. The claim isn’t just about length; it’s about flexibility.

From realistic human faces to stylised anime avatars and even three‑dimensional game characters, the system allegedly handles the full spectrum of visual styles without extra fine‑tuning. Real‑time generation means the output appears frame by frame, sidestepping the batch‑processing pipelines that typically lock users into hours‑long post‑production. If the model truly keeps a 45‑minute clip stable, it could reshape how creators think about low‑cost, on‑the‑fly content.

The following details lay out exactly how LPM 1.0 achieves that breadth and speed.

LPM 1.0 works across different image styles, photorealistic faces, anime, and 3D game characters, without any additional training. The entire video generation runs as a streaming process in real time rather than rendering a finished video all at once. Videos up to 45 minutes long should remain stable.

LPM 1.0 utilizes what the researchers call "multi-granularity identity conditioning:" alongside a main image, the model also receives reference images from different angles and with varying facial expressions. This means it doesn't have to invent details like teeth, wrinkles tied to specific emotions, or profile views on its own -- it can pull them directly from the reference material. When listening, it generates reactive facial expressions like nodding or gaze shifts based on incoming audio.

When speaking, the response audio drives lip movements and body language. During pauses, LPM generates natural idle behavior based on text instructions. Beyond real-time conversation, LPM 1.0 also supports offline video generation from existing audio, useful for podcasts or movie dialogs, according to project manager Ailing Zeng.

One image, a full conversation. LPM 1.0 claims to turn that still into a 45‑minute, lip‑synced video in real time. The model hooks directly into voice AIs such as ChatGPT, producing speaking, listening, or singing avatars that display hesitation, gaze shifts, and smooth emotional changes.

It reportedly handles photorealistic faces, anime art, and 3D game characters without additional training, and the generation proceeds as a streaming process rather than a batch render. However, the claim that videos up to 45 minutes “should remain stable” lacks published benchmarks, leaving durability under extended use unclear. The ability to operate across diverse visual styles is noteworthy, yet the article does not detail computational requirements or latency figures, which are critical for real‑time deployment.

Consequently, while LPM 1.0 demonstrates a promising integration of image‑to‑video synthesis and voice interaction, further evidence is needed to assess its practical limits and consistency across varied content. Future studies that publish quantitative performance data would help clarify its suitability for commercial or research applications.

Further Reading

Common Questions Answered

How does LPM 1.0 generate video from a single photo across different visual styles?

LPM 1.0 uses a 'multi-granularity identity conditioning' technique that allows it to generate videos from a single image across photorealistic faces, anime, and 3D game characters without additional training. The model can create up to 45-minute videos that maintain the original image's identity while synchronizing lip movements and displaying natural emotional variations.

What makes LPM 1.0's video generation process unique compared to traditional rendering methods?

Unlike traditional video rendering that requires batch processing and extensive computational resources, LPM 1.0 generates videos as a streaming process in real time. This approach allows for continuous video generation with smooth transitions and the ability to create lengthy videos up to 45 minutes long without needing a render farm.

How does LPM 1.0 integrate with voice AI technologies like ChatGPT?

LPM 1.0 can directly hook into voice AI systems like ChatGPT to produce speaking avatars that display nuanced behaviors such as hesitation, gaze shifts, and emotional changes. The model can generate avatars that not only lip-sync with spoken content but also provide a more natural and dynamic conversational experience across various visual styles.