Google's DiffusionGemma: open diffusion model for faster text generation
Why does text generation feel sluggish on a single‑GPU machine? Most large language models write one token at a time, a method that maximizes quality but forces the GPU to shuffle weights far more often than it crunches numbers.