Introduction
The Cursor team has announced Composer 2, a major update to their flagship coding model, now available to all users. This isn't just an incremental improvement over the previous version; it represents a new milestone in AI-agent development for software engineering. Composer 2 combines frontier-level intelligence with unprecedented efficiency, setting new records in industry-standard evaluations.
Following the success of the first Composer version introduced in Cursor 2.0, the developers focused on creating a model capable of handling truly complex, multi-stage tasks. Composer 2 is the result of deep research into Reinforcement Learning (RL) and large-scale pretraining on specialized datasets.
Breakthrough in Coding Intelligence
Composer 2 demonstrates significant performance gains across all key benchmarks used by the Cursor team. This confirms that the model has become much smarter at understanding code structure and executing commands.
Two benchmarks deserve special attention:
- Terminal-Bench 2.0: This is the modern standard for evaluating AI agents in terminal environments. Cursor with the Composer 2 model achieved the highest scores here, outperforming other specialized solutions. This means the model is now much more reliable when executing build, test, and deployment commands.
- SWE-bench Multilingual: This test measures an AI's ability to fix real-world software issues (GitHub issues) across different programming languages. Composer 2 significantly improved its scores, handling tasks that were previously considered too complex for automation.
These successes were made possible through continued pretraining, a stage of further training that laid a powerful foundation for subsequent model tuning.
Benchmarking Details and the Harbor Framework
The high scores in Terminal-Bench 2.0 were no accident. The evaluation used the official Harbor framework, a specialized tool for testing agent capabilities in the terminal. Unlike simple text-generation tests, Harbor verifies how successfully an agent can:
- Diagnose environment errors.
- Fix dependencies.
- Execute complex shell scripts.
- React to error output in real-time.
Comparison with other models showed that Composer 2 outperforms even powerful models like Claude 3.5 Sonnet (using the Claude Code harness) and GPT-4o in specific terminal management tasks.
SWE-bench Multilingual: Beyond Python
For a long time, SWE-bench focused primarily on Python. However, the real world of development is much broader. Composer 2 was trained on multilingual data, allowing it to show high results in SWE-bench Multilingual. This covers tasks in Java, TypeScript, C++, and other popular languages, making Cursor a universal tool for polyglot programmers.
Training for Long Horizons (Reinforcement Learning)
The primary technological advantage of Composer 2 lies in its ability to work over "long horizons." While regular LLMs often lose context or make mistakes when executing long chains of actions, Composer 2 is specifically trained to solve tasks requiring hundreds of sequential steps.
How It Works:
- Reinforcement Learning (RL): The model was trained on a massive amount of programming scenarios, receiving feedback for each correct action.
- Autonomy: You can now assign an agent a task like "refactor this module and update all related tests," and it will perform it sequentially, verifying intermediate results in the terminal.
- Reliability: Thanks to RL, the risk of the model "hallucinating" code or getting stuck in a loop has been significantly reduced.
Flexibility in Choice: Speed vs. Cost
Cursor offers users a choice between two model variants to find the right balance for their needs:
- Optimal Version (Base): Priced at $0.50 per 1M input and $2.50 per 1M output tokens. This is the ideal choice for most daily tasks, offering the best price-to-performance ratio.
- Fast Version: At $1.50 per 1M input and $7.50 per 1M output tokens, this model has the same intelligence as the base version but operates significantly faster. The Cursor team has made this variant the default to maximize the speed of the coding process.
Notably, for individual plans, Composer usage is now allocated into a standalone Usage Pool, providing more transparent and convenient resource management.
Availability and the Glass Interface
Composer 2 is already available in the main version of the Cursor editor. But for those who want to look into the future of AI development, the team has prepared something special—an early alpha version of a new interface called Glass.
Glass reimagines how we interact with an AI agent, making the co-coding process even more intuitive and visually clear. Composer 2 is the "heart" of this new interface, providing lightning-fast response times and a deep understanding of project context.
Conclusion
The release of Composer 2 confirms Cursor's status as a leader in AI-powered development tools. The transition from simple code suggestions to full autonomous agents capable of solving complex software engineering tasks is becoming a reality.
For developers, this means less routine and more time for architectural design and creativity. We recommend trying Composer 2 on a complex bug or a major new feature—the results will pleasantly surprise you.
Sources
- Introducing Composer 2 — Official Blog
- Terminal Bench Leaderboard
- Harbor Evaluation Framework
- Cursor Documentation: Models & Pricing
Interested in learning more about modern AI models? Check out our review of MiniMax M2.7 or explore the capabilities of Alibaba Qwen.