Liquid AI LFM2.5-1.2B-Thinking: On-Device Reasoning Under 1GB

Liquid AI LFM2.5-1.2B-Thinking is a breakthrough 1.2B reasoning model that fits in 900MB, delivering high-performance logic on phones and laptops.

by HowAIWorks Team
Liquid AILFM 2.5On-device AIEdge AIReasoning ModelsMachine LearningNPUAMD RyzenQualcommThinking Models

Liquid AI LFM2.5-1.2B-Thinking

Introduction

The promise of true "thinking" AI—models that don't just predict the next word but internalize logical steps before speaking—has traditionally been the domain of massive data centers. Running these reasoning-heavy workloads usually requires high-end GPUs and significant power consumption. However, Liquid AI has just shattered this barrier with the release of LFM2.5-1.2B-Thinking.

This new addition to the LFM 2.5 family is a 1.2 billion parameter model specifically engineered for sophisticated reasoning tasks on edge devices. What required a cluster of servers two years ago now runs offline, entirely on-device, fitting within a mere 900 MB of memory. This release signals a critical shift toward "Reasoning-as-a-Service" on hardware as small as a smartphone or a laptop.

The Power of Internal Reasoning Traces

Unlike standard instruction-following models that provide immediate answers, LFM2.5-1.2B-Thinking generates thinking traces. This process allows the model to work through problems systematically, verifying intermediate results and adjusting its approach before producing the final output.

The impact of this "reason-first" architecture is most visible in complex, objective benchmarks:

  • Mathematical Reasoning: On the MATH-500 benchmark, the model's score skyrocketed from 63 to 88 compared to the LFM2.5-1.2B-Instruct version.
  • Instruction Following: Improved from 61 to 69 on the Multi-IF benchmark.
  • Agentic Tool Use: Jumped from 49 to 57 on BFCLv3 (Berkeley Function Calling Leaderboard).

These metrics demonstrate that even at a compact 1.2B scale, a model can exhibit high-level logic if trained with the right "thinking" recipe.

Technical Breakthroughs: The Training Recipe

Building a capable reasoning model at such a small scale presented unique challenges, particularly the "knowledge capacity" limit. Liquid AI addressed this through a sophisticated training and alignment pipeline:

Eliminating the "Doom Loop"

A common failure mode for small reasoning models is the "doom loop"—getting stuck in repetitive text patterns instead of reaching a conclusion. Liquid AI mitigated this during preference alignment by sampling multiple candidates (5 temperature-sampled and 1 greedy) and using an LLM judge to favor non-looping, high-quality responses.

RLVR and GRPO-style Optimization

The model's reinforcement learning (RL) pipeline is built on a fork of verl, focusing on critic-free, group-relative policy-gradient optimization (GRPO-style). By applying techniques like asymmetric ratio clipping and dynamic filtering, the team reduced the occurrence of doom loops from 15.74% in mid-training to just 0.36% in the final RLVR (Reinforcement Learning from Verifiable Rewards) stage.

Curriculum RL and Model Merging

Rather than training a single monolithic model for all tasks, Liquid AI used a Curriculum RL approach. They created domain-specific checkpoints for instruction following, math, and tool use, and then used iterative model merging to combine these specialized capabilities. This flexibility allowed them to navigate trade-offs more effectively than sequential training.

Performance and Hardware Ecosystem

The LFM2.5-1.2B-Thinking model isn't just a research experiment; it's ready for deployment. Liquid AI has expanded its partner ecosystem to ensure high-performance execution across diverse hardware:

  • AMD Ryzen NPUs: Using the specialized FastFlowLM runtime, the model sustains 52 tokens per second at 16K context and 46 tokens per second even at its full 32K context.
  • Qualcomm NPUs: Optimized through a partnership with Nexa AI, bringing fast inference to mobile and PC devices.
  • Cross-Platform Support: "Day-zero" support is available for llama.cpp, MLX, vLLM, and ONNX Runtime, covering Apple, AMD, Qualcomm, and NVIDIA hardware.

[!TIP] While the Thinking model excels at logic and agentic tasks, Liquid AI recommends using the LFM2.5-1.2B-Instruct for standard chat and creative writing, where reasoning traces might add unnecessary latency.

Conclusion

The release of LFM2.5-1.2B-Thinking proves that size is not the only factor in intelligence. By optimizing architecture and training recipes for reasoning rather than just raw scale, Liquid AI is enabling a new generation of AI agents that can think, plan, and use tools entirely in your pocket. As on-device AI matures, these compact "thinking" models will become the backbone of privacy-first, ultra-fast digital assistants.

To stay updated on the latest AI developments, explore our AI models catalog, check out our glossary for key terms, or browse our blog for more insights.

Sources

Frequently Asked Questions

The Thinking version generates internal reasoning traces before answering, significantly improving performance on math, logic, and tool-use tasks (e.g., MATH-500 score jumped from 63 to 88).
The model is designed to run entirely on-device, fitting within approximately 900MB of memory, making it ideal for mobile phones and laptops.
It supports Apple, AMD, Qualcomm, and NVIDIA hardware through frameworks like llama.cpp, MLX, vLLM, and ONNX Runtime, with specific optimizations for AMD Ryzen and Qualcomm NPUs.
Liquid AI used a specialized RLVR (Reinforcement Learning from Verifiable Rewards) pipeline to reduce 'doom loops' (repetitive text patterns) from 15.74% down to 0.36%.
Benchmarks show it matches or exceeds Qwen3-1.7B on most reasoning tasks despite having 40% fewer parameters, offering better efficiency in terms of output tokens and speed.

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.