Our content generation service is experiencing issues. A human-curated summary is being prepared.
Open Source

Xiaomi launches MiMo‑V2‑Flash AI model: 150 t/s, USD 0.1‑USD 0.3 per million tokens

3 min read

Xiaomi’s latest foray into generative AI arrives as an open‑source model named MiMo‑V2‑Flash, a move that signals the company’s intent to compete beyond hardware. The release comes amid a crowded field where cost and latency often dictate adoption, especially for developers building chat‑bots or content‑generation pipelines. While the tech is impressive on paper, the real test lies in how it measures up against established offerings from firms like Moonshot AI.

Xiaomi has positioned the model as a low‑budget alternative, aiming to attract startups and researchers who balk at the expense of larger commercial APIs. The company also highlights benchmark results that put MiMo‑V2‑Flash in the same performance bracket as Moonshot’s Kimi K2 Thinking. Here’s the thing: if those numbers hold up in real‑world workloads, the model could shift how quickly and cheaply AI services are deployed.

The details, according to Xiaomi, are as follows.

According to Xiaomi, the model delivers inference speeds of up to 150 tokens per second and operates at a low cost of $0.1 per million input tokens and $0.3 per million output tokens. On benchmarks, Xiaomi claimed MiMo-V2-Flash achieves performance comparable to Moonshot AI's Kimi K2 Thinking and DeepSeek V3.2 across most reasoning tests, while surpassing Kimi K2 in long-context evaluations. In agentic tasks, the model scored 73.4% on SWE-Bench Verified, outperforming all open-source peers and approaching OpenAI's GPT-5-High.

Xiaomi also said it matches Anthropic's Claude 4.5 Sonnet on coding tasks at a fraction of the cost. MiMo-V2-Flash uses a Mixture-of-Experts architecture to split large neural networks and has 309 billion parameters, allowing it to balance performance and efficiency. It allows Xiaomi engineers to work on architectural optimisations, significantly reducing the cost of processing long prompts by limiting how much past context the model needs to re-evaluate.

Luo Fuli, a former DeepSeek researcher who recently joined Xiaomi's MiMo team, described the release as "step two on our AGI roadmap" in a post on X, referring to artificial general intelligence.

Related Topics: #MiMo-V2-Flash #Xiaomi #generative AI #Moonshot AI #Kimi K2 #DeepSeek V3.2 #SWE-Bench #GPT-5-High #Claude 4.5 Sonnet

Will MiMo‑V2‑Flash reshape the market? Xiaomi says the model can generate 150 tokens per second while charging just $0.1 per million input tokens and $0.3 per million output tokens. The open‑source release is hosted on MiMo Studio, Hugging Face, and an API platform, making it globally accessible.

Designed for reasoning, coding and agentic tasks, it is also pitched as a general‑purpose assistant. Xiaomi positions it alongside offerings from DeepSeek, Moonshot AI, Anthropic and OpenAI, claiming performance comparable to Moonshot’s Kimi K2 Thinking. Yet independent benchmarks are not yet public, so the extent of parity remains uncertain.

The low‑cost pricing model could attract developers looking for affordable inference, but real‑world adoption will depend on ecosystem support and stability. In short, Xiaomi has entered the AI race with a model that promises speed and affordability, though its practical impact is still to be measured. Developers can download the weights immediately, but integration details are sparse.

Moreover, the claim of reasoning strength hasn't been validated against standard suites beyond the company's own tests.

Further Reading

Common Questions Answered

What inference speed does the MiMo‑V2‑Flash model claim to achieve?

Xiaomi states that MiMo‑V2‑Flash can generate up to 150 tokens per second during inference. This speed is highlighted as a competitive advantage for real‑time applications such as chat‑bots and content generation pipelines.

How does the pricing of MiMo‑V2‑Flash compare to other generative AI models?

MiMo‑V2‑Flash is priced at $0.1 per million input tokens and $0.3 per million output tokens, which Xiaomi markets as a low‑cost option. These rates are positioned to be cheaper than many proprietary models, making it attractive for developers with high‑volume workloads.

Against which models does Xiaomi benchmark MiMo‑V2‑Flash, and what were the results?

Xiaomi benchmarks MiMo‑V2‑Flash against Moonshot AI's Kimi K2 Thinking and DeepSeek V3.2, reporting comparable performance on most reasoning tests. In long‑context evaluations, MiMo‑V2‑Flash surpasses Kimi K2, and it achieved a 73.4% score on the SWE‑Bench Verified agentic task, outperforming all open‑source competitors mentioned.

Where can developers access the open‑source MiMo‑V2‑Flash model?

The model is publicly available on MiMo Studio, Hugging Face, and through Xiaomi's own API platform. This multi‑platform distribution ensures global accessibility for developers looking to integrate the model into reasoning, coding, or general‑purpose assistant applications.