Replit CEO says using more tokens yields higher‑quality inputs, then tests apps
Replit’s chief executive has been vocal about a problem that’s cropping up across generative‑code platforms: the output often feels bland, as if it’s missing a sense of “taste.” In a recent discussion, he traced the issue back to how models are prompted and evaluated, arguing that the shortcut of cramming minimal context into a request can leave the resulting app under‑cooked. He points to the practice of trimming token counts to keep costs low, even though the trade‑off may be a less nuanced solution. To counter that, his team has begun treating token budget as a quality lever rather than a constraint, feeding richer prompts into the model.
Once an initial version of an app is spun up, it doesn’t sit idle; a dedicated testing agent steps in, dissects every feature, and feeds its findings back to the coding agent for refinement. The goal, he says, is to move beyond generic scaffolding and toward outputs that feel more purposeful. The team also isn’t hesitant to use more tokens; this results in higher‑quality inputs, Masad notes.
After the first generation of an app, Masad’s team kicks the result off to a testing agent, which analyzes all its features, then reports back to a coding agent about what worked (and didn’t). “If yo
The team also isn't hesitant to use more tokens; this results in higher-quality inputs, Masad notes. After the first generation of an app, Masad's team kicks the result off to a testing agent, which analyzes all its features, then reports back to a coding agent about what worked (and didn't). "If you introduce testing in the loop, you can give the model feedback and have the model reflect on its work," Masad says.
Pitting models against one another is another of Replit's strategies: Testing agents may be built on one LLM, coding agents on another. "That way the product you're giving to the customer is high effort and less sloppy," Masad says. "You generate more variety." Ultimately, he describes a "push and pull" between what the model can actually do and what teams need to build on top of it to add value.
Also, "if you wanna move fast and you wanna ship things, you need to throw away a lot of code," he says. Why vibe coding is the future There's still a lot of frustration around AI because, Masad acknowledges, it isn't living up to the intense hype. Chatbots are well-established but they offer a "marginal improvement" in workflows.
Vibe coding is beginning to take off partly because it's the best way for companies to adopt AI in an impactful way, he notes. It can "make everyone in the enterprise the software engineer," he says, allowing employees to solve problems and improve efficiency through automation, thus requiring less reliance on traditional SaaS tools. "I would say that the population of professional developers who studied computer science and trained as developers will shrink over time," Masad says.
On the flip side, the population of vibe coders who can solve problems with software and agents will grow "tremendously" over time.
Masad’s assessment cuts through the hype. He points to a flood of similar‑looking outputs—images that echo each other, code that feels interchangeable. “There’s a lot of sameness out there,” he says, framing the problem as more than lazy prompting; it’s a missing “flavor.” The team’s response is pragmatic: they feed longer prompts, using more tokens to coax richer inputs.
After an app’s first generation, a testing agent evaluates every feature, then relays its findings to a coding agent that tweaks what worked and discards what didn’t. The loop is clear, but its impact is still uncertain. Higher‑token prompts do produce “higher‑quality inputs,” yet whether this translates into consistently better applications remains to be proven.
The approach shows promise, but the broader claim that more tokens alone will solve the generic‑output issue lacks concrete evidence. As Masad notes, the current AI output feels generic; the new testing pipeline may address that, but its effectiveness will need further validation.
Further Reading
- Replit CEO Says I Don't Want My Tesla Autopilot to Be Vibe Coded - FinalRoundAI
- No Longer Think You Should Learn To Code, Says CEO of AI Coding Startup - Slashdot
- Inside Replit's path to $100M ARR - Growth Unhinged
Common Questions Answered
Why does Replit CEO advocate using more tokens in prompts?
He says longer prompts provide higher-quality inputs, allowing the model to capture nuance and avoid bland, interchangeable outputs. Using more tokens trades higher cost for richer, more flavorful code generation.
How does Replit incorporate a testing agent after the first generation of an app?
After the initial code is generated, the testing agent examines every feature, identifies what works and what fails, and then reports its findings to a coding agent. This feedback loop lets the model reflect on its work and improve the final output.
What role does the coding agent play in Replit’s testing loop?
The coding agent receives the testing agent’s analysis and uses that information to adjust or rewrite parts of the app, effectively iterating on the code. This collaborative process helps eliminate the “sameness” that Masad observes across generative‑code platforms.
According to Masad, what is the underlying problem causing “bland” outputs on generative‑code platforms?
Masad identifies that developers often trim token counts to cut costs, resulting in minimal context that produces interchangeable, flavor‑less code and images. He argues that the lack of “taste” is a symptom of insufficient prompting rather than model limitations.