Liquid AI LFM2.5-1.2B-Thinking: Compact Power

Exploring Liquid AI's newest 1.2B reasoning model optimized for agentic tasks, RAG, and high-speed edge inference with LIV convolution architecture.

by HowAIWorks Team
Liquid AILFM 2.5Edge AIOn-device AICompact ModelsAgentic AIRAGMachine Learning

Liquid AI LFM2.5-1.2B-Thinking Model

Introduction

Liquid AI has recently unveiled its latest breakthrough in compact language models: the LFM2.5-1.2B-Thinking. As part of the LFM 2.5 family, this model represents a significant step forward in bringing sophisticated reasoning capabilities to edge devices. Unlike traditional Transformers that struggle with memory and compute constraints on smaller hardware, Liquid's models leverage a unique architecture designed for efficiency and speed.

The LFM2.5-1.2B-Thinking is specifically tuned for precision and logic, making it a powerful tool for developers looking to build local agents or fast RAG pipelines without the need for massive GPU clusters.

Key Features and Architecture

The "1.2B" in its name refers to its 1.17 billion parameters, yet its performance punches well above its weight class. Here are the core specifications:

  • Liquid Architecture: The model consists of 16 layers, utilizing 10 double-gated LIV (Liquid Interleaved Variable) convolution blocks paired with 6 GQA (Grouped Query Attention) blocks.
  • Large Context Window: Despite its size, it supports a 32,768 token context length, which is essential for complex RAG tasks and long-form data extraction.
  • Massive Training Data: Liquid AI trained this model on a staggering 28 trillion tokens, ensuring a high degree of "world knowledge" and linguistic fluency for its size.
  • Multilingual Support: Out of the box, it supports English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.

Performance & Optimization

One of the most impressive aspects of the LFM2.5-1.2B-Thinking is its inference speed. Liquid AI has partnered with industry leaders like AMD and Qualcomm to optimize the LFM family for NPUs (Neural Processing Units).

  • CPU & NPU Efficiency: The model offers extremely fast inference on standard CPUs with low memory footprints.
  • Edge Performance: On AMD Ryzen NPUs using FastFlowLM, the model can sustain ~52 tokens per second at 16K context and ~46 tokens per second even at its full 32K context.
  • Compact Thinking: Compared to models like Qwen3-1.7B, the LFM2.5-1.2B-Thinking achieves comparable or better results while requiring fewer output tokens to reach the same conclusion.

Use Cases: Agents and RAG

Liquid AI recommends this model for specific scenarios where speed and reliability are paramount:

  • Agentic Tasks: Its ability to handle "thinking" steps and function calling makes it ideal for autonomous agents that need to run locally.
  • Data Extraction: Its reasoning capabilities allow it to parse complex documents and extract structured information with high accuracy.
  • Local RAG: With a 32K context window and fast inference, it's perfect for searching through local knowledge bases and summarizing information on-the-fly.

[!IMPORTANT] While powerful, Liquid AI notes that this model is not recommended for knowledge-intensive "Jeopardy-style" tasks or deep programming work, where larger-scale models still hold the edge.

Conclusion

The release of LFM2.5-1.2B-Thinking signals a shift in the AI industry toward specialized, compact models. By focusing on architecture-level efficiency rather than just raw size, Liquid AI is making "thinking" AI accessible for vehicles, mobile devices, and IoT hardware. As on-device AI continues to evolve, models like this will be the backbone of the next generation of privacy-focused, always-on digital assistants.

Explore more about AI models in our models catalog, learn about AI agents in our glossary, or discover AI development tools in our AI tools directory.

Sources

Frequently Asked Questions

It uses a hybrid architecture combining double-gated LIV convolution blocks with GQA, specifically optimized for edge devices and reasoning tasks.
It is designed for CPUs and NPUs, with specific optimizations for AMD Ryzen and Qualcomm platforms, making it ideal for laptops and mobile devices.
The model excels at agentic tasks, data extraction, and Retrieval-Augmented Generation (RAG) due to its specialized reasoning capabilities.

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.