TTT-Discover uses RL at inference to optimize GPU kernel speed, outperforming human experts by 2x [threads.com](https://www.t

Editorial illustration for TTT-Discover uses inference-time RL to double GPU kernel speed vs experts

AI Learns to Optimize GPU Kernels During Inference

TTT-Discover uses inference-time RL to double GPU kernel speed vs experts

February 5, 2026 • 2 min read

Why does a GPU kernel that runs twice as fast matter? Because in high‑performance computing, shaving even a few milliseconds off a routine can translate into massive cost savings at scale. TTT‑Discover claims to achieve that by training a model not during the usual pre‑deployment phase but while it’s actually executing.

The system reportedly doubles the speed of kernel generation compared to seasoned engineers, all without changing the underlying hardware. Here’s the thing: most reinforcement‑learning pipelines aim for a one‑size‑fits‑all policy, hoping it will perform adequately across a broad suite of tasks. TTT‑Discover flips that script, using inference‑time feedback to fine‑tune each kernel on the fly.

The result is a specialized, high‑throughput output that outpaces conventional expert‑crafted code. This shift raises questions about how we evaluate model training objectives and whether “generalist” policies are still the default target.

A different approach to reinforcement learning...

A different approach to reinforcement learning TTT-Discover provides a fundamental shift on how reasoning models are trained. In standard reinforcement learning (RL) training, the goal is a generalist policy that performs well on average across many tasks. In TTT-Discover, the goal is to find the best solution to a very specific problem, and the policy is "a means towards this end," according to the authors.

Once the model discovers the artifact (i.e., the optimized code, the proof, or the molecule) the neural network that produced it can be discarded. To achieve this, the researchers engineered two specific components that differentiate TTT-Discover from standard reinforcement learning: Entropic objective: Standard RL optimizes for the average expected reward.

TTT-Discover optimizes GPU kernels 2x faster than human experts — by training during inference - VentureBeat AI

TTT-Discover shows that training a model at inference can produce concrete performance gains. In the reported experiment, a critical GPU kernel ran twice as fast as the previous hand‑tuned version written by experts. The method was developed by researchers at Stanford, Nvidia, and Together AI, and it departs from the usual “think longer” mindset that dominates many reasoning systems.

By allowing reinforcement‑learning updates to continue during test time, the approach seeks a task‑specific optimum rather than a broad, average‑case policy. This focus on discovery at run‑time marks a clear departure from standard RL, where the objective is a generalist policy. Whether the same speedups can be achieved on other kernels, or on workloads beyond GPU code, remains unclear.

The authors note that the technique “provides a fundamental shift” in how reasoning models are trained, but they do not yet present evidence of broader applicability. Results are promising. As such, the results are promising yet limited to the demonstrated case, and further validation will be needed to assess the method’s general usefulness.

Common Questions Answered

How does TTT-Discover differ from traditional reinforcement learning approaches?

Unlike standard reinforcement learning that aims to create a generalist policy performing well on average across tasks, TTT-Discover focuses on finding the absolute best solution to a specific problem. The method treats the model as a means to discover an optimal artifact, such as an ultra-efficient GPU kernel, by continuously learning and updating during the test phase itself.

What specific performance improvements did TTT-Discover achieve in GPU kernel engineering?

TTT-Discover demonstrated significant speed improvements in GPU kernel performance, reducing execution times by nearly 50% compared to human expert implementations. For instance, on the H100 GPU, the method generated a kernel running at 1161 μs, which was substantially faster than the previous best human-created kernel at 1371 μs.

What makes the test-time training approach of TTT-Discover unique?

TTT-Discover introduces a novel approach called 'test-time training' where the model continues to learn and refine its solution during the inference phase for a specific problem. By using an entropic objective that prioritizes finding the single best solution rather than average performance, the method allows the model to dynamically update its weights and internalize the specific structure of the task at hand.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

AI Learns to Optimize GPU Kernels During Inference

Further Reading

Common Questions Answered

How does TTT-Discover differ from traditional reinforcement learning approaches?

What specific performance improvements did TTT-Discover achieve in GPU kernel engineering?

What makes the test-time training approach of TTT-Discover unique?

Most Popular

Alphabet posts USD 400 B revenue, YouTube tops streaming, 325 M paid subs

Databricks DB cuts app build to days; Lakebase runs PostgreSQL on lakehouse

Gemini helps create 7‑day low‑cost meal plan for USD 200 grocery budget

Google launches Personal Intelligence in AI Mode for Pro and Ultra users

Musk merges SpaceX with xAI and X, cites new AI‑compute satellite plan

Qwen3-Coder-Next: 10× throughput beats Claude‑Opus‑4.5 on SecCodeBench

Sam Altman says OpenAI’s Super Bowl ad focuses on builders, not Anthropic jokes

Shared memory adds documented actions for transparent AI orchestration

AI agents launch dedicated social network as GitLab showcases roadmap

Anthropic's Super Bowl LX ad omits OpenAI, ChatGPT references in AI‑focused spot

Further Reading

Related Reading

Hyperparameter Tuning Reaches 0.9617 Accuracy in 64.59 Seconds

Pharma Cautious as AI Promises Faster Drug Discovery and Smarter Trials

Google AI Advisors Let Users Probe Performance with Conversational “Why” Queries

OpenClaw AI skill extensions flagged as security nightmare by OpenSourceMalware

Anthropic teams with Allen Institute and HHMI to boost transparent scientific AI

Common Questions Answered

How does TTT-Discover differ from traditional reinforcement learning approaches?

What specific performance improvements did TTT-Discover achieve in GPU kernel engineering?

What makes the test-time training approach of TTT-Discover unique?

Most Popular

Alphabet posts USD 400 B revenue, YouTube tops streaming, 325 M paid subs

Databricks DB cuts app build to days; Lakebase runs PostgreSQL on lakehouse

Gemini helps create 7‑day low‑cost meal plan for USD 200 grocery budget

Google launches Personal Intelligence in AI Mode for Pro and Ultra users

Musk merges SpaceX with xAI and X, cites new AI‑compute satellite plan

Qwen3-Coder-Next: 10× throughput beats Claude‑Opus‑4.5 on SecCodeBench

Sam Altman says OpenAI’s Super Bowl ad focuses on builders, not Anthropic jokes

Shared memory adds documented actions for transparent AI orchestration

AI agents launch dedicated social network as GitLab showcases roadmap

Anthropic's Super Bowl LX ad omits OpenAI, ChatGPT references in AI‑focused spot