OpenCV founders launch CraftStory AI video startup using proprietary footage
When the two engineers who built OpenCV announced CraftStory, they framed it as a generative-video shop that might give OpenAI or Google a run for their money. The real twist, though, isn’t the headline-making seed round, it’s the data pipeline they cobbled together themselves. Instead of pulling millions of clips off the web, they booked studio time, hired actors and rigged high-frame-rate cameras that can freeze even the tiniest finger flick.
By dictating lighting, angles and motion-capture setups, they dodge the blur that usually haunts internet-sourced footage. That choice seems to dictate everything from how sharp the model looks to what kinds of edits it can actually pull off.
CraftStory’s model was trained on that in-house footage, not on a dump of scraped videos. The company set up shoots in controlled studios, using cameras that capture crisp detail on fast-moving parts like hands. It’s unclear whether this approach will scale, but the early results suggest the lack of motion blur could give the system a noticeable edge.
Crucially, CraftStory trained its model on proprietary footage rather than relying solely on internet-scraped videos. The company hired studios to shoot actors using high-frame-rate camera systems that capture crisp detail even in fast-moving elements like fingers -- avoiding the motion blur inherent in standard 30-frames-per-second YouTube clips. "What we showed is that you don't need a lot of data and you don't need a lot of training budget to create high quality videos," Erukhimov said.
"You just need high quality data." Model 2.0 currently operates as a video-to-video system: users upload a still image to animate and a "driving video" containing a person whose movements the AI will replicate. CraftStory provides preset driving videos shot with professional actors, who receive revenue shares when their motion data is used, or users can upload their own footage.
Can a startup actually outpace the giants? CraftStory seems to think it can. Its Model 2.0 spits out human-centric videos that can stretch to five minutes - a length most rivals barely touch.
The founders, the same team that built OpenCV, just stepped out of stealth with a $2 million seed round, which hints at early investor confidence. Instead of pulling footage from the web like OpenAI’s Sora or Google’s Veo, CraftStory shoots everything in controlled studios, using high-frame-rate cameras to catch crisp detail even when fingers move fast. That sidesteps the motion-blur mess that haunts internet-scraped data.
The result should be clearer motion, but it’s still unclear whether their proprietary pipeline can handle the messiness of real-world scenarios. Their claim of a “dramatic leap” over rivals is hard to verify without side-by-side benchmarks. Still, a squad of seasoned computer-vision experts paired with a tight data-collection plan feels like a solid entry into AI-generated video.
Whether the technical edge turns into wider adoption remains to be seen. Funding studio shoots shows they’re willing to pour money into data quality, yet we don’t know how that will affect future customers’ costs. We’ll be watching how the model deals with content that falls outside the curated set.
Further Reading
- CraftStory Unveils First AI Model to Create 5-Minute, Studio-Quality Human Videos - PR Newswire
- OpenCV lanza CraftStory: IA de video para enfrentar a OpenAI y Google - Ecosistema Startup
- About us - CraftStory - CraftStory
Common Questions Answered
What type of footage did CraftStory use to train its generative video model?
CraftStory trained its model on proprietary footage shot in controlled studios, rather than relying on publicly scraped videos. The founders hired studios to capture actors with high-frame-rate camera systems, ensuring crisp detail even in fast‑moving elements like fingers.
How does CraftStory's Model 2.0 differentiate itself from competitors like OpenAI's Sora and Google's Veo?
Model 2.0 can generate human‑centric videos up to five minutes long, a duration most competitors rarely achieve. It leverages the high‑frame‑rate, studio‑captured footage to produce clearer motion without the blur typical of standard 30‑fps internet clips.
What advantage does using high‑frame‑rate cameras provide CraftStory's video generation?
High‑frame‑rate cameras capture more frames per second, preserving crisp detail in fast‑moving subjects such as fingers, which reduces motion blur. This results in higher‑quality video outputs compared to models trained on lower‑resolution, 30‑fps YouTube videos.
How much funding did CraftStory raise and what does this indicate about investor confidence?
CraftStory emerged from stealth with a $2 million seed round, reflecting strong early investor confidence in the startup's approach. The funding supports further development of its proprietary data pipeline and scaling of Model 2.0.
Why do the founders claim they don't need large amounts of data or training budget to create high‑quality videos?
The founders argue that by using carefully curated, high‑quality proprietary footage captured with high‑frame‑rate cameras, they can achieve superior video quality without massive datasets or expensive training resources. This focused data strategy reduces the need for extensive data collection and costly compute.