
Introduction
Liquid AI has recently unveiled its latest breakthrough in compact language models: the LFM2.5-1.2B-Thinking. As part of the LFM 2.5 family, this model represents a significant step forward in bringing sophisticated reasoning capabilities to edge devices. Unlike traditional Transformers that struggle with memory and compute constraints on smaller hardware, Liquid's models leverage a unique architecture designed for efficiency and speed.
The LFM2.5-1.2B-Thinking is specifically tuned for precision and logic, making it a powerful tool for developers looking to build local agents or fast RAG pipelines without the need for massive GPU clusters.
Key Features and Architecture
The "1.2B" in its name refers to its 1.17 billion parameters, yet its performance punches well above its weight class. Here are the core specifications:
- Liquid Architecture: The model consists of 16 layers, utilizing 10 double-gated LIV (Liquid Interleaved Variable) convolution blocks paired with 6 GQA (Grouped Query Attention) blocks.
- Large Context Window: Despite its size, it supports a 32,768 token context length, which is essential for complex RAG tasks and long-form data extraction.
- Massive Training Data: Liquid AI trained this model on a staggering 28 trillion tokens, ensuring a high degree of "world knowledge" and linguistic fluency for its size.
- Multilingual Support: Out of the box, it supports English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
Performance & Optimization
One of the most impressive aspects of the LFM2.5-1.2B-Thinking is its inference speed. Liquid AI has partnered with industry leaders like AMD and Qualcomm to optimize the LFM family for NPUs (Neural Processing Units).
- CPU & NPU Efficiency: The model offers extremely fast inference on standard CPUs with low memory footprints.
- Edge Performance: On AMD Ryzen NPUs using FastFlowLM, the model can sustain ~52 tokens per second at 16K context and ~46 tokens per second even at its full 32K context.
- Compact Thinking: Compared to models like Qwen3-1.7B, the LFM2.5-1.2B-Thinking achieves comparable or better results while requiring fewer output tokens to reach the same conclusion.
Use Cases: Agents and RAG
Liquid AI recommends this model for specific scenarios where speed and reliability are paramount:
- Agentic Tasks: Its ability to handle "thinking" steps and function calling makes it ideal for autonomous agents that need to run locally.
- Data Extraction: Its reasoning capabilities allow it to parse complex documents and extract structured information with high accuracy.
- Local RAG: With a 32K context window and fast inference, it's perfect for searching through local knowledge bases and summarizing information on-the-fly.
[!IMPORTANT] While powerful, Liquid AI notes that this model is not recommended for knowledge-intensive "Jeopardy-style" tasks or deep programming work, where larger-scale models still hold the edge.
Conclusion
The release of LFM2.5-1.2B-Thinking signals a shift in the AI industry toward specialized, compact models. By focusing on architecture-level efficiency rather than just raw size, Liquid AI is making "thinking" AI accessible for vehicles, mobile devices, and IoT hardware. As on-device AI continues to evolve, models like this will be the backbone of the next generation of privacy-focused, always-on digital assistants.
Explore more about AI models in our models catalog, learn about AI agents in our glossary, or discover AI development tools in our AI tools directory.