Skip to main content
Mistral AI's new TTS model, running on older chips, outperforms ElevenLabs, supporting 9 languages.

Editorial illustration for Mistral AI releases TTS model that beats ElevenLabs, runs on old chips, supports 9 languages

Mistral AI's TTS Model Beats ElevenLabs, Runs on Old Chips

Mistral AI releases TTS model that beats ElevenLabs, runs on old chips, supports 9 languages

3 min read

Mistral AI just dropped a new text‑to‑speech model that it claims outperforms ElevenLabs on every benchmark it’s been tested against. The surprise isn’t just the audio quality; the model is built to run on hardware that many developers consider obsolete. It supports nine languages—English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic—so multilingual projects can stay in‑house without juggling separate services.

Even more, a custom voice can be cloned from as little as five seconds of reference audio, a threshold that would have seemed unrealistic a year ago. For teams that can’t afford the latest GPUs, the promise of real‑time synthesis on older chips could change budgeting decisions dramatically.

And you can run it on super old chips -- it's still going to be real time.

And you can run it on super old chips -- it's still going to be real time." The model supports nine languages -- English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic -- and can adapt to a custom voice with as little as five seconds of reference audio. Perhaps more remarkably, it demonstrates zero-shot cross-lingual voice adaptation without explicit training for that task. Stock illustrated this with a personal example: he can feed the model 10 seconds of his own French-accented voice, type a prompt in German, and the model will generate German speech that sounds like him -- complete with his natural accent and vocal characteristics.

For enterprises operating across borders, this capability unlocks cascaded speech-to-speech translation that preserves speaker identity, a feature that has obvious applications in customer support, sales, and internal communications for multinational organizations. Human evaluators preferred Voxtral over ElevenLabs nearly 70 percent of the time on voice customization Mistral is not being coy about which competitor it intends to displace. In human evaluations conducted by the company, Voxtral TTS achieved a 62.8 percent listener preference rate against ElevenLabs Flash v2.5 on flagship voices and a 69.9 percent preference rate in voice customization tasks.

Mistral also claims the model performs at parity with ElevenLabs v3 -- the company's premium, higher-latency tier -- on emotional expressiveness, while maintaining similar latency to the much faster Flash model.

Mistral AI’s new text‑to‑speech model arrives with a bold claim: it outperforms ElevenLabs while offering the weights for free. The model runs in real time even on “super old chips,” a point the company highlighted in its launch remarks. It supports nine languages—English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi and Arabic—and can be fine‑tuned to a custom voice with as little as five seconds of reference audio.

The release lands amid an intensifying enterprise voice‑AI race, with ElevenLabs partnering with IBM, Google Cloud expanding its Chirp 3 HD voices, and OpenAI continuing its own iterations. Voice‑AI revenue hit $22 billion globally in 2026, and the voice‑AI agents segment is projected to reach $47.5 billion by 2030.

Whether Mistral’s model can sustain its performance edge across all supported languages remains uncertain, as does its ability to attract developers away from entrenched platforms. Nonetheless, the free‑weight offering could lower entry barriers for smaller teams seeking high‑quality speech synthesis. The market’s rapid expansion suggests that additional options will continue to emerge, each vying for a share of the growing demand.

Further Reading

Common Questions Answered

How does Mistral AI's new text-to-speech model compare to ElevenLabs in performance?

Mistral AI claims their new TTS model outperforms ElevenLabs on every benchmark tested. The model demonstrates superior audio quality and versatility across multiple languages and voice adaptation scenarios.

What unique hardware capabilities does the Mistral AI text-to-speech model offer?

The model can run in real-time on older, potentially obsolete computer chips, making it highly accessible for developers with limited hardware resources. This capability sets it apart from many other advanced AI voice models that require more powerful computing infrastructure.

What languages are supported by Mistral AI's new text-to-speech model?

The model supports nine languages: English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. This broad language support enables multilingual projects to use a single TTS solution without needing multiple services.

How quickly can the Mistral AI model adapt to a custom voice?

The model can clone a custom voice from as little as five seconds of reference audio. It even demonstrates zero-shot cross-lingual voice adaptation without explicit training for that specific task.