Introducing Composer 2: The New Frontier of AI Coding in Cursor

Cursor launches Composer 2, a next-gen model with record-breaking results on SWE-bench and Terminal-Bench, trained for long-horizon autonomous tasks.

by HowAIWorks Team
CursorComposer 2AI CodingLLMSoftware EngineeringReinforcement LearningProductivityProgramming ToolsMachine LearningDeveloper Experience

Introduction

The Cursor team has announced Composer 2, a major update to their flagship coding model, now available to all users. This isn't just an incremental improvement over the previous version; it represents a new milestone in AI-agent development for software engineering. Composer 2 combines frontier-level intelligence with unprecedented efficiency, setting new records in industry-standard evaluations.

Following the success of the first Composer version introduced in Cursor 2.0, the developers focused on creating a model capable of handling truly complex, multi-stage tasks. Composer 2 is the result of deep research into Reinforcement Learning (RL) and large-scale pretraining on specialized datasets.

Breakthrough in Coding Intelligence

Composer 2 demonstrates significant performance gains across all key benchmarks used by the Cursor team. This confirms that the model has become much smarter at understanding code structure and executing commands.

Two benchmarks deserve special attention:

  • Terminal-Bench 2.0: This is the modern standard for evaluating AI agents in terminal environments. Cursor with the Composer 2 model achieved the highest scores here, outperforming other specialized solutions. This means the model is now much more reliable when executing build, test, and deployment commands.
  • SWE-bench Multilingual: This test measures an AI's ability to fix real-world software issues (GitHub issues) across different programming languages. Composer 2 significantly improved its scores, handling tasks that were previously considered too complex for automation.

These successes were made possible through continued pretraining, a stage of further training that laid a powerful foundation for subsequent model tuning.

Benchmarking Details and the Harbor Framework

The high scores in Terminal-Bench 2.0 were no accident. The evaluation used the official Harbor framework, a specialized tool for testing agent capabilities in the terminal. Unlike simple text-generation tests, Harbor verifies how successfully an agent can:

  • Diagnose environment errors.
  • Fix dependencies.
  • Execute complex shell scripts.
  • React to error output in real-time.

Comparison with other models showed that Composer 2 outperforms even powerful models like Claude 3.5 Sonnet (using the Claude Code harness) and GPT-4o in specific terminal management tasks.

SWE-bench Multilingual: Beyond Python

For a long time, SWE-bench focused primarily on Python. However, the real world of development is much broader. Composer 2 was trained on multilingual data, allowing it to show high results in SWE-bench Multilingual. This covers tasks in Java, TypeScript, C++, and other popular languages, making Cursor a universal tool for polyglot programmers.

Training for Long Horizons (Reinforcement Learning)

The primary technological advantage of Composer 2 lies in its ability to work over "long horizons." While regular LLMs often lose context or make mistakes when executing long chains of actions, Composer 2 is specifically trained to solve tasks requiring hundreds of sequential steps.

How It Works:

  • Reinforcement Learning (RL): The model was trained on a massive amount of programming scenarios, receiving feedback for each correct action.
  • Autonomy: You can now assign an agent a task like "refactor this module and update all related tests," and it will perform it sequentially, verifying intermediate results in the terminal.
  • Reliability: Thanks to RL, the risk of the model "hallucinating" code or getting stuck in a loop has been significantly reduced.

Flexibility in Choice: Speed vs. Cost

Cursor offers users a choice between two model variants to find the right balance for their needs:

  • Optimal Version (Base): Priced at $0.50 per 1M input and $2.50 per 1M output tokens. This is the ideal choice for most daily tasks, offering the best price-to-performance ratio.
  • Fast Version: At $1.50 per 1M input and $7.50 per 1M output tokens, this model has the same intelligence as the base version but operates significantly faster. The Cursor team has made this variant the default to maximize the speed of the coding process.

Notably, for individual plans, Composer usage is now allocated into a standalone Usage Pool, providing more transparent and convenient resource management.

Availability and the Glass Interface

Composer 2 is already available in the main version of the Cursor editor. But for those who want to look into the future of AI development, the team has prepared something special—an early alpha version of a new interface called Glass.

Glass reimagines how we interact with an AI agent, making the co-coding process even more intuitive and visually clear. Composer 2 is the "heart" of this new interface, providing lightning-fast response times and a deep understanding of project context.

Conclusion

The release of Composer 2 confirms Cursor's status as a leader in AI-powered development tools. The transition from simple code suggestions to full autonomous agents capable of solving complex software engineering tasks is becoming a reality.

For developers, this means less routine and more time for architectural design and creativity. We recommend trying Composer 2 on a complex bug or a major new feature—the results will pleasantly surprise you.

Sources


Interested in learning more about modern AI models? Check out our review of MiniMax M2.7 or explore the capabilities of Alibaba Qwen.

Frequently Asked Questions

Composer 2 is the result of Reinforcement Learning (RL), significantly improving scores on Terminal-Bench and SWE-bench. The model is now capable of solving tasks requiring hundreds of sequential steps.
The Optimal version costs $0.50/M input and $2.50/M output tokens. The Fast version, which is the default, costs $1.50/M input and $7.50/M output tokens.
The Glass interface is in early alpha and available separately. It's designed for the most efficient interaction with the Composer 2 agent.
Reinforcement Learning has allowed the model to handle 'long-horizon' planning, providing high autonomy and reliability when performing large-scale code edits.

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.