Editorial illustration for Mistral Small 4 matches Medium 3.1 and Large 3 on MMLU Pro, cuts inference cost
Mistral Small 4: Tiny Model Matches Large AI Rivals
Mistral Small 4 matches Medium 3.1 and Large 3 on MMLU Pro, cuts inference cost
Mistral’s newest offering, Small 4, arrives with a promise that could shift how businesses allocate compute resources. The model packs reasoning, vision and coding capabilities into a single 7‑billion‑parameter architecture, yet its hardware footprint stays markedly lower than the 13‑billion‑parameter Medium 3.1 and the 30‑billion‑parameter Large 3. In practice, that means firms can run more queries per dollar without sacrificing the depth of understanding required for complex tasks.
While the tech is impressive on paper, the real test is whether it can hold its own on established benchmarks that gauge real‑world utility. Mistral’s internal testing suggests it does—especially on the MMLU Pro suite, which stresses broad knowledge and nuanced instruction following. If those numbers hold up, enterprises handling massive document‑processing pipelines might finally get a model that balances cost and capability.
According to Mistral's benchmarks, Small 4 performs close to the level of Mistral Medium 3.1 and Mistral Large 3, particularly in MMLU Pro. Mistral said the instruction‑following performance makes Small 4 suited for high‑volume enterprise tasks such as document understanding.
Benchmark performances According to Mistral's benchmarks, Small 4 performs close to the level of Mistral Medium 3.1 and Mistral Large 3, particularly in MMLU Pro. Mistral said the instruction-following performance makes Small 4 suited for high-volume enterprise tasks such as document understanding. While competitive with other small models from other companies, Small 4 still performs below other popular open-source models, especially in reasoning-intensive tasks.
Qwen 3.5 122B and Qwen 3-next 80B outperform Small 4 on LiveCodeBench, as does Claude Haiku in instruct mode. Mistral Small 4 was able to beat OpenAI's GPT-OSS 120B in the LCR. Mistral argues that Small 4 achieves these scores with "significantly shorter outputs" that translate to lower inference costs and latency than the other models.
While Mistral Small 4 promises to merge reasoning, vision and coding into a single open‑source model, its real‑world impact is still uncertain. The benchmark data show Small 4 hitting performance levels close to Medium 3.1 and Large 3 on MMLU Pro, a result that could make it attractive for high‑volume enterprise tasks such as document understanding. Its design emphasizes shorter outputs, which translates to lower latency and cheaper token usage—a clear pitch against competing small models like Qwen and Claude Haiku.
Yet the claim that enterprises can drop separate models for each capability hinges on whether the consolidated approach meets the nuanced demands of varied workloads. The reported inference cost advantage is compelling, but it remains unclear whether the trade‑off in model size will affect more complex multimodal or coding scenarios. Overall, Small 4 adds a noteworthy option to the crowded field of cost‑focused models, though its ability to replace specialized stacks will depend on further testing beyond the presented benchmarks.
Further Reading
- Introducing Mistral Small 4 - Mistral AI
- Mistral AI makes enterprise push with two new launches - Silicon Republic
- Mistral Small 4 Review — Pricing, Benchmarks & Capabilities (2026) - Design for Online
- Introducing Mistral Small 4 - Simon Willison's Weblog - Simon Willison's Weblog
- Mistral Small 4 is Here: One Model That Does it All - YouTube
Common Questions Answered
How does Mistral Small 4 compare to other models in the Mistral lineup?
Mistral Small 4 performs close to the level of Mistral Medium 3.1 and Mistral Large 3, particularly in MMLU Pro benchmarks. Despite being a 7-billion-parameter model, it matches the performance of larger models while maintaining a smaller hardware footprint.
What enterprise tasks is Mistral Small 4 well-suited for?
Mistral Small 4 is particularly suited for high-volume enterprise tasks such as document understanding, thanks to its strong instruction-following performance. The model combines reasoning, vision, and coding capabilities in a compact architecture that allows for more efficient query processing.
What are the key advantages of Mistral Small 4's design?
Mistral Small 4 offers lower inference costs and reduced hardware requirements compared to larger models, enabling businesses to run more queries per dollar. Its design emphasizes shorter outputs, which translates to lower latency and more cost-effective token usage.