TurboQuant and OSCAR competing in KV cache compression benchmark at ICLR 2026 conference, showcasing performance metrics and

Editorial illustration for TurboQuant and OSCAR vie in KV cache compression race at ICLR 2026

TurboQuant and OSCAR vie in KV cache compression race at...

By AI Daily Post Edited by Brian Petersen, Editor-in-Chief

June 18, 2026 • Updated: July 15, 2026 • 4 min read

The KV cache is the bottleneck eating large language models alive. At ICLR 2026, three teams, Google and NYU with TurboQuant, Together AI with OSCAR, and Apple with EpiCache, are fighting over how to shrink it. The enemy is the same: outlier channels, those few coordinates with wild magnitudes that hijack the quantization range and leave the rest of the signal gasping for bits.

Naive INT2 quantization collapses to rubble. KIVI set the standard by treating keys and values differently, but now TurboQuant and OSCAR attack from opposite flanks. TurboQuant never looks at your data.

It rotates each vector randomly, transforming coordinates into near-independent Gaussians, then applies a precomputed optimal quantizer. A second stage, a 1-bit Quantized Johnson-Lindenstrauss transform, recovers the residual with provable unbiasedness. The result is distortion within 2.7× of the theoretical lower bound, near-perfect recall at 4× compression, and quality that holds steady down to 2.5 bits.

Meanwhile, OSCAR takes a completely different path, and EpiCache solves a problem neither of them touches. The race is on.

The post The KV Cache Compression Race: TurboQuant vs OSCAR vs EpiCache appeared first on MarkTechPost .

The KV Cache Compression Race: TurboQuant vs OSCAR vs EpiCache - MarkTechPost

The race between TurboQuant and OSCAR is a study in elegant extremes. TurboQuant arrives with a proof, not a prayer. It doesn’t need to see your data to know where the outliers hide; it scrambles them into submission with a random rotation, then mops up the residual with a mathematically bulletproof bit.

OSCAR, by contrast, learns its way out of the same trap, adapting to the specific shape of your attention landscape. Both are valid. Both are powerful.

But neither touches the deeper silence EpiCache exploits. That is the real lesson of this ICLR. The field has been so focused on squeezing the same old KV vectors into smaller boxes that it forgot to ask whether those vectors should exist at all.

EpiCache doesn’t compress the cache; it sidesteps it. That is not a refinement. It is a reframing.

So where does this leave us? TurboQuant offers a guarantee. OSCAR offers adaptability.

EpiCache offers an escape. The smartest systems will not choose one. They will layer them: use EpiCache’s logic to decide what to cache, TurboQuant’s rotation to pack it tight, and OSCAR’s learned sensitivity to know where the bits matter most.

The race is not about who wins at ICLR. It is about who builds the bridge between theory and deployment. The winner will be the one that makes the others obsolete.

Common Questions Answered

What is the main problem that TurboQuant, OSCAR, and EpiCache are trying to solve at ICLR 2026?

These three teams are competing to address the KV cache bottleneck in large language models by developing methods to compress it effectively. The core challenge they face is handling outlier channels—coordinates with extreme magnitudes that distort quantization ranges and leave the rest of the signal with insufficient bit allocation.

How does TurboQuant's approach to KV cache compression differ from OSCAR's method?

TurboQuant uses a mathematically proven approach that doesn't require data inspection; it applies random rotation to scramble outliers and then uses a bulletproof bit allocation strategy to handle residuals. OSCAR, in contrast, learns adaptively from your specific attention landscape to optimize compression for each model's unique characteristics.

Why is naive INT2 quantization insufficient for KV cache compression according to the article?

Naive INT2 quantization collapses because outlier channels with wild magnitudes hijack the quantization range, leaving insufficient bits for the rest of the signal. This fundamental limitation is why the competing approaches at ICLR 2026 employ more sophisticated techniques like differential treatment of keys and values, building upon the KIVI standard.

What was the KIVI standard's contribution to KV cache compression before TurboQuant and OSCAR?

KIVI established a baseline approach by treating keys and values differently during quantization, moving beyond one-size-fits-all methods. This differentiated treatment helped address some of the outlier channel problems, though the newer methods like TurboQuant and OSCAR represent significant improvements over this standard.

Ship an AI product this weekend — no engineers required.

Structured, in-depth lessons on the exact no-code tools — not scattered tutorials.

The exact platforms, taught in depth
Build real, working projects
Our honest review + a reader discount

Read the review →

TurboQuant and OSCAR vie in KV cache compression race at...

Common Questions Answered

What is the main problem that TurboQuant, OSCAR, and EpiCache are trying to solve at ICLR 2026?

How does TurboQuant's approach to KV cache compression differ from OSCAR's method?

Why is naive INT2 quantization insufficient for KV cache compression according to the article?

What was the KIVI standard's contribution to KV cache compression before TurboQuant and OSCAR?

Further Reading

Ship an AI product this weekend — no engineers required.

Latest News

EU Rules Will Force AI Chatbots and Hotlines to Disclose Their Nature

AI tools flag thousands of flaws, but few get weaponized

AI Deletes Spreadsheet Data When Asked to Clean Entry

Claude Opus 5 Advances from Color Blocks to 3D Game Prototypes

METR Urges Independent AI Agent Investigations After Hugging Face Incident

NVIDIA's Molt: A PyTorch Framework for Agentic Reinforcement Learning Research

AMD's Instella-MoE-16B Hits 12.7% Speedup With New FarSkip Training Technique

Fenix Flexin' New Single Sparks AI Slop Debate Over Vocal Style

AI Fails to Crack Math's "Major Advance" Problems, USD 1M Prizes Remain

AI Coding Agents Speed Tasks but Can't Verify Science

Related Reading

ChatGPT's 'Nerdy' tweak rewards goblin metaphors in answers, study finds

Google tests visual 'magazine-style' UI for Gemini 3 Pro users

AI Engineers Face Rising Costs, Need New Strategies for Efficiency

Study probes if language models can hypothesize new math structures

NVIDIA XR AI Enables Real‑Time Multimodal Agents for AR Glasses

Common Questions Answered

What is the main problem that TurboQuant, OSCAR, and EpiCache are trying to solve at ICLR 2026?

How does TurboQuant's approach to KV cache compression differ from OSCAR's method?

Why is naive INT2 quantization insufficient for KV cache compression according to the article?

What was the KIVI standard's contribution to KV cache compression before TurboQuant and OSCAR?

Further Reading

Ship an AI product this weekend — no engineers required.

Latest News

EU Rules Will Force AI Chatbots and Hotlines to Disclose Their Nature

AI tools flag thousands of flaws, but few get weaponized

AI Deletes Spreadsheet Data When Asked to Clean Entry

Claude Opus 5 Advances from Color Blocks to 3D Game Prototypes

METR Urges Independent AI Agent Investigations After Hugging Face Incident

NVIDIA's Molt: A PyTorch Framework for Agentic Reinforcement Learning Research

AMD's Instella-MoE-16B Hits 12.7% Speedup With New FarSkip Training Technique

Fenix Flexin' New Single Sparks AI Slop Debate Over Vocal Style

AI Fails to Crack Math's "Major Advance" Problems, USD 1M Prizes Remain

AI Coding Agents Speed Tasks but Can't Verify Science