Liquid AI unveils LFM2.5-230M model launch with open-source AI frameworks including llama.cpp, MLX, vLLM, SGLang, and ONNX fo

Editorial illustration for Liquid AI releases LFM2.5-230M, adds llama.cpp, MLX, vLLM, SGLang, ONNX

Liquid AI releases LFM2.5-230M, adds llama.cpp, MLX,...

Liquid AI releases LFM2.5-230M, adds llama.cpp, MLX, vLLM, SGLang, ONNX

By AI Daily Post Edited by Brian Petersen, Editor-in-Chief

June 28, 2026 • 3 min read

Why does this matter? Because getting AI to run on a phone or a tiny robot has been a long‑standing hurdle. Liquid AI just shipped LFM2.5‑230M, its smallest model yet—a 230‑million‑parameter, text‑only checkpoint built on the LFM2 architecture.

Both the base and an instruction‑tuned version are open‑weight on Hugging Face under the lfm1.0 license. While the tech is impressive, the company makes it clear this isn’t a general‑purpose reasoning engine; it’s tuned for data extraction and tool use on edge hardware. Here’s the thing: on a Galaxy S25 Ultra the model pushes 213 tokens per second, and on a Raspberry Pi 5 it still manages 42 t/s.

That performance lets it outpace larger rivals like Qwen3.5‑0.8B and Gemma 3 1B on instruction following and extraction tasks. Day‑one support spans llama.cpp, MLX, vLLM, SGLang and ONNX, with a modest 293–375 MB footprint. The 14‑layer hybrid layout—eight double‑gated LIV convolution blocks plus six grouped‑query attention blocks—targets fast CPU inference, a 32,768‑token context, and ten languages up to a mid‑2024 knowledge cutoff.

The post-training recipe then runs in three stages.
First comes supervised fine-tuning with distillation from the larger LFM2.5-350M. This preserves flexibility for downstream specialization.

The distillation step is what keeps a 230M model competitive with larger checkpoints. It inherits behavior from the bigger LFM2.5-350M on targeted tasks.

Benchmark

Liquid AI team evaluated LFM2.5-230M across ten benchmarks. They span knowledge, instruction following, data extraction, and tool use. That beats Qwen3.5-0.8B (59.94) and Gemma 3 1B IT (63.49). On IFBench it scores 38.40, ahead of both. On CaseReportBench, a clinical data-extraction test, it scores 22.51.

Model Params IFEval IFBench CaseReportBench BFCLv4 MMLU-Pro
LFM2.5-230M 230M 71.71 38.40 22.51 21.03 20.25
LFM2.5-350M 350M 76.96 40.69 32.45 21.86 20.01
Granite 4.0-H-350M 350M 61.27 17.22 12.44 13.28 13.14
Qwen3.5-0.8B (Instruct) 800M 59.94 22.87 13.83 18.70 37.42
Gemma 3 1B IT 1B 63.49 20.33 2.28 7.17 14.04

LFM2.5-230M leads on instruction following and data extraction. It trails on broad knowledge: MMLU-Pro is 20.25, behind Qwen3.5-0.8B’s 37.42. It is also weak on some agentic tool use. On τ²-Bench Telecom it scores just 5.26.

Liquid AI is direct about the limits. It does not recommend the model for reasoning-heavy workloads. That means advanced math, code generation, and creative writing.

Use Cases With Examples

The model fits two jobs well.

The first is large-scale data extraction pipelines. Picture a pipeline parsing 100,000 clinical reports into structured fields.

Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference - MarkTechPost

Model	Params	IFEval	IFBench	CaseReportBench	BFCLv4	MMLU-Pro
LFM2.5-230M	230M	71.71	38.40	22.51	21.03	20.25
LFM2.5-350M	350M	76.96	40.69	32.45	21.86	20.01
Granite 4.0-H-350M	350M	61.27	17.22	12.44	13.28	13.14
Qwen3.5-0.8B (Instruct)	800M	59.94	22.87	13.83	18.70	37.42
Gemma 3 1B IT	1B	63.49	20.33	2.28	7.17	14.04

Why this matters

We see Liquid AI's LFM2.5-230M as a modest but deliberate step toward on‑device agentic workloads. At 230 M parameters it is the company's smallest model yet, and it ships with open‑weight checkpoints on Hugging Face, which lowers the barrier for developers who want to experiment on phones, robots, or other automation hardware. By bundling llama.cpp, MLX, vLLM, SGLang and ONNX, the release promises broader compatibility with existing inference stacks.

The three‑stage post‑training recipe—supervised fine‑tuning, distillation from the 350 M sibling, and a final specialization step—aims to preserve flexibility for downstream tasks. Yet the brief description leaves open whether the distilled model can meet latency and accuracy expectations on constrained devices. Can it deliver?

Moreover, the narrow pitch suggests the team is not targeting general‑purpose use cases, so its relevance may stay limited to niche applications. As we assess the offering, we remain cautious: the open‑weight nature is encouraging, but practical adoption will depend on real‑world performance data that the announcement does not provide.