OpenCV founders launch CraftStory AI video startup using proprietary footage
The new venture, CraftStory, arrives with a pedigree that reads like a résumé for computer‑vision royalty: the two engineers who built OpenCV have now turned their attention to generative video. Their pitch positions the startup against heavyweight AI labs at OpenAI and Google, but the real differentiator isn’t the headline‑grabbing funding round. It’s the data pipeline they’ve built from the ground up.
Rather than feeding a model a sea of publicly scraped clips, the founders assembled a series of studio shoots, hiring actors and deploying high‑frame‑rate rigs that freeze even the subtlest finger movements. By controlling lighting, angles and motion capture, they sidestep the blur that typically plagues internet‑sourced footage. That decision shapes everything from the model’s fidelity to the kinds of edits it can reliably reproduce.
Crucially, CraftStory trained its model on proprietary footage rather than relying solely on internet‑scraped videos. The company hired studios to shoot actors using high‑frame‑rate camera systems that capture crisp detail even in fast‑moving elements like fingers — avoiding the motion blur inheren.
Crucially, CraftStory trained its model on proprietary footage rather than relying solely on internet-scraped videos. The company hired studios to shoot actors using high-frame-rate camera systems that capture crisp detail even in fast-moving elements like fingers -- avoiding the motion blur inherent in standard 30-frames-per-second YouTube clips. "What we showed is that you don't need a lot of data and you don't need a lot of training budget to create high quality videos," Erukhimov said.
"You just need high quality data." Model 2.0 currently operates as a video-to-video system: users upload a still image to animate and a "driving video" containing a person whose movements the AI will replicate. CraftStory provides preset driving videos shot with professional actors, who receive revenue shares when their motion data is used, or users can upload their own footage.
Can a startup outpace the giants? CraftStory thinks so. Its Model 2.0 produces human‑centric videos up to five minutes long, a length that current competitors rarely reach.
The founders, who built OpenCV, emerged from stealth this week with a $2 million seed round, signaling confidence from early investors. Unlike OpenAI’s Sora or Google’s Veo, CraftStory relies on footage shot in controlled studios, using high‑frame‑rate cameras to capture crisp detail even in fast‑moving fingers, thereby sidestepping the motion‑blur problem that plagues internet‑scraped data. The approach promises clearer motion rendering, but whether the proprietary pipeline can scale to the diversity of real‑world scenarios remains unclear.
Moreover, the claim of a “dramatic leap” over rivals is difficult to verify without side‑by‑side benchmarks. Still, the combination of seasoned computer‑vision experts and a focused data‑collection strategy marks a notable entry into AI‑generated video. Time will reveal if the technical advantages translate into broader adoption.
The company’s decision to fund studio shoots suggests a willingness to invest heavily in data quality, yet the cost implications for future customers are not detailed. Observers will watch how the model performs on content outside the curated set.
Further Reading
- CraftStory Unveils First AI Model to Create 5-Minute, Studio-Quality Human Videos - PR Newswire
- OpenCV lanza CraftStory: IA de video para enfrentar a OpenAI y Google - Ecosistema Startup
- About us - CraftStory - CraftStory
Common Questions Answered
What type of footage did CraftStory use to train its generative video model?
CraftStory trained its model on proprietary footage shot in controlled studios, rather than relying on publicly scraped videos. The founders hired studios to capture actors with high-frame-rate camera systems, ensuring crisp detail even in fast‑moving elements like fingers.
How does CraftStory's Model 2.0 differentiate itself from competitors like OpenAI's Sora and Google's Veo?
Model 2.0 can generate human‑centric videos up to five minutes long, a duration most competitors rarely achieve. It leverages the high‑frame‑rate, studio‑captured footage to produce clearer motion without the blur typical of standard 30‑fps internet clips.
What advantage does using high‑frame‑rate cameras provide CraftStory's video generation?
High‑frame‑rate cameras capture more frames per second, preserving crisp detail in fast‑moving subjects such as fingers, which reduces motion blur. This results in higher‑quality video outputs compared to models trained on lower‑resolution, 30‑fps YouTube videos.
How much funding did CraftStory raise and what does this indicate about investor confidence?
CraftStory emerged from stealth with a $2 million seed round, reflecting strong early investor confidence in the startup's approach. The funding supports further development of its proprietary data pipeline and scaling of Model 2.0.
Why do the founders claim they don't need large amounts of data or training budget to create high‑quality videos?
The founders argue that by using carefully curated, high‑quality proprietary footage captured with high‑frame‑rate cameras, they can achieve superior video quality without massive datasets or expensive training resources. This focused data strategy reduces the need for extensive data collection and costly compute.