Skip to main content
Liquid AI's LFM2.5-VL-450M model with bounding boxes, demonstrating sub-250 ms inference speed.

Editorial illustration for Liquid AI's LFM2.5-VL-450M: model with bounding boxes, sub‑250 ms inference

Liquid AI's 450M Model: Ultra-Fast Vision-Language Tech

Liquid AI's LFM2.5-VL-450M: model with bounding boxes, sub‑250 ms inference

2 min read

Liquid AI’s latest release, the LFM2.5‑VL‑450M, packs 450 million parameters into a vision‑language model that can predict bounding boxes and handle multiple languages—all while keeping inference under 250 ms on edge devices. That speed figure isn’t just a brag; it’s the result of a design that lets developers adjust how the model processes images on the fly. While the architecture already supports sub‑250 ms latency, the real flexibility comes from being able to balance speed against output fidelity without having to retrain the network.

This matters for anyone deploying the model across a range of hardware, from low‑power IoT units to more capable edge servers. The company even supplies a baseline set of generation settings—temperature, min‑p, and repetition penalties—to help users get consistent results out of the box. Below, Liquid AI explains how those knobs work and why they’re useful for tailoring performance to different compute budgets.

At inference time, users can tune the maximum image tokens and tile count for a speed/quality tradeoff without retraining, which is useful when deploying across hardware with different compute budgets. The recommended generation parameters from Liquid AI are temperature=0.1 , min_p=0.15 , and repetition_penalty=1.05 for text, and min_image_tokens=32 , max_image_tokens=256 , and do_image_splitting=True for vision inputs. On the training side, Liquid AI scaled pre-training from 10T to 28T tokens compared to LFM2-VL-450M, followed by post-training using preference optimization and reinforcement learning to improve grounding, instruction following, and overall reliability across vision-language tasks. New Capabilities Over LFM2-VL-450M The most significant addition is bounding box prediction.

What does LFM2.5‑VL‑450M actually deliver? A 450 million‑parameter vision‑language model that now predicts bounding boxes, follows instructions more closely, understands more languages and can invoke functions, all while fitting on edge chips such as NVIDIA’s Jetson Orin, AMD’s Ryzen AI Max+ 395 and Qualcomm’s Snapdragon 8 Elite. The claim of sub‑250 ms inference suggests it could handle real‑time tasks, yet the article provides no benchmark data beyond the speed target.

Users can adjust maximum image tokens and tile counts at inference time, trading quality for speed without retraining—a flexibility that may ease deployment across devices with differing compute budgets. Liquid AI recommends a low temperature of 0.1, a min‑p of 0.15 and an unspecified repetition setting, implying a focus on deterministic outputs, though the impact of these parameters on diverse multilingual prompts remains unclear. The model’s expanded multilingual support and function‑calling ability hint at broader applicability, but without third‑party validation it is uncertain how it will perform in complex, real‑world scenarios.

Ultimately, the release adds notable capabilities to the LFM line, while practical effectiveness will depend on further testing and integration experience.

Further Reading

Common Questions Answered

How does the LFM2.5-VL-450M model achieve sub-250 ms inference speed?

The model allows developers to dynamically adjust image processing parameters at inference time, such as maximum image tokens and tile count. This flexibility enables a speed/quality tradeoff without requiring retraining, making it adaptable to different hardware compute budgets.

What recommended generation parameters does Liquid AI suggest for the LFM2.5-VL-450M model?

For text generation, Liquid AI recommends using a temperature of 0.1, min_p of 0.15, and a repetition penalty of 1.05. For vision inputs, they suggest setting min_image_tokens to 32, max_image_tokens to 256, and enabling image splitting.

What are the key capabilities of the LFM2.5-VL-450M vision-language model?

The model can predict bounding boxes, follow instructions more closely, understand multiple languages, and invoke functions. It is designed to fit on edge devices like NVIDIA's Jetson Orin, AMD's Ryzen AI Max+ 395, and Qualcomm's Snapdragon 8 Elite, with a target inference speed of under 250 milliseconds.