LFM2.5-VL-1.6B is a multimodal generative model developed by Liquid AI, released as part of the LFM 2.5 family in January 2026. Designed for efficient on-device execution, it combines a 1.2 billion parameter language backbone with a 400 million parameter SigLIP2 vision encoder. The model is built on Liquid's proprietary Liquid Foundation Model (LFM) architecture, which utilizes a hybrid design to provide high throughput and low memory usage compared to standard Transformer architectures of a similar scale.
Capabilities and Performance
The model is specifically tuned for real-world visual reasoning tasks, including document understanding, high-resolution OCR, and multi-image comprehension. It features a native resolution processing strategy that handles images up to 512x512 pixels without distortion and utilizes a tiling strategy for larger images. Its architecture supports a context window of 32,768 tokens, facilitating long-form multimodal interactions.
LFM2.5-VL-1.6B offers robust multilingual support, handling vision-language prompts in several languages including Arabic, Chinese, French, German, Japanese, Korean, and Spanish. Due to its architectural optimizations, it is intended for deployment on resource-constrained hardware such as mobile devices, laptops, and automotive systems.