Mamba‑3 halves state size, matches Mamba‑2 perplexity, ~4% LM gain, lower latency
Why does a half‑sized state matter for today’s language models? Mamba‑3 arrives with a headline‑grabbing claim: it trims the internal state to 50 % of what Mamba‑2 required, yet still posts a roughly 4 % gain on standard language‑modeling...