OpenAI Releases GPT-5.5: The Agentic Coding Revolution

OpenAI has announced GPT-5.5 (Spud), a massive new base model optimized for agentic coding, handling complex 20-hour tasks with ease. Here are the details.

by HowAIWorks Team
OpenAIGPT-5.5LLMsAgentic CodingAI ModelsSWE ProAnthropicResearch

Introduction

OpenAI has officially unveiled GPT-5.5, internally codenamed "Spud," marking a significant leap forward in the capabilities of large language models. Rather than jumping straight to the highly anticipated GPT-6, OpenAI has introduced this massive new base model as an iterative yet powerful upgrade, bringing unparalleled advancements in agentic coding, scientific research, and long-horizon task execution.

The AI community has been closely watching the development of autonomous coding agents, and GPT-5.5 is designed specifically to dominate this space. With the ability to seamlessly use tools and tackle long, complex workflows, this release sets a new standard for AI developer assistants. Starting today, GPT-5.5 and GPT-5.5 Pro are rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, with API access coming soon.

Agentic Coding and Tool Mastery

The standout feature of GPT-5.5 is its profound enhancement in agentic coding. It's not just about generating code snippets; it's about executing multi-step workflows autonomously across complex environments.

  • Advanced Tool Utilization: GPT-5.5 demonstrates a vastly improved ability to interact with external tools, terminal environments, and complex codebases without getting stuck in loops. On Terminal-Bench 2.0, which tests complex command-line workflows, the model achieves a state-of-the-art accuracy of 82.7%.
  • Handling Long-Horizon Tasks: According to the official release notes, the model excels at maintaining context and staying on track across extensive tasks. In OpenAI's internal Expert-SWE evaluation—which involves tasks with a median estimated human completion time of 20 hours—GPT-5.5 successfully tackles challenges from end to end, outperforming its predecessor GPT-5.4.
  • Token Efficiency: One of the most critical improvements highlighted in the model's performance graphs is token efficiency. GPT-5.5 requires significantly fewer tokens to achieve a high level of reasoning and response quality, delivering state-of-the-art intelligence at half the cost of competitive frontier coding models on the Artificial Analysis Coding Index.

Performance Metrics: SWE Pro and Competitors

When it comes to industry benchmarks, GPT-5.5 delivers strong results, although the competitive landscape remains fierce.

On the SWE-Bench Pro benchmark, which measures an AI's ability to resolve real-world GitHub issues, GPT-5.5 scores 58.6%. While this is a highly impressive metric that underscores its coding prowess, it is worth noting that it still trails slightly behind Anthropic's Claude 4.7 Opus, which currently holds a commanding 64.3% on the same benchmark. However, GPT-5.5 boasts an advantage in solving tasks in a single pass without extensive retry loops.

A "Co-Scientist" for Bioinformatics and Mathematics

Beyond software engineering, GPT-5.5 is making significant waves in the scientific community. OpenAI positions it not just as an assistant, but as a "bona fide co-scientist."

  • Biomedical Research: On GeneBench and BixBench, which focus on multi-stage scientific data analysis in genetics and bioinformatics, GPT-5.5 achieved leading performance. It can reason through ambiguous or errorful data, address quality-control failures, and correctly implement modern statistical methods—tasks that usually take experts days to complete.
  • Advanced Mathematics: In a remarkable demonstration of its capabilities, an internal version of GPT-5.5 helped discover a new mathematical proof about Ramsey numbers in combinatorics, a result later verified in the Lean theorem prover.

Infrastructure and Cybersecurity

Serving such a massive model required OpenAI to rethink its inference stack. GPT-5.5 was co-designed for, trained with, and served on NVIDIA GB200 and GB300 NVL72 systems. Interestingly, the model was used to improve its own infrastructure: Codex analyzed production traffic and wrote custom heuristic algorithms to optimally balance workloads, increasing token generation speeds by over 20%.

With increased capabilities come advanced security requirements. GPT-5.5 represents an incremental step toward AI that can solve tough cybersecurity challenges. However, to prevent misuse, OpenAI is deploying stricter classifiers for potential cyber risks. While some users might find these safeguards initially restrictive, they are part of a broader effort to ensure the safe deployment of powerful autonomous tools.

Pricing and Naming Strategy

While the model requires fewer tokens to generate high-quality outputs, this efficiency comes at a cost. OpenAI has raised the pricing for GPT-5.5, reflecting the underlying size and computational demands of this new, significantly larger base model.

The decision to name this iteration GPT-5.5 instead of GPT-6 has also sparked discussion. Industry analysts speculate that OpenAI may be aligning its versioning strategy to match Anthropic's recent releases, avoiding a direct major version mismatch and maintaining an ongoing rhythm of fractional updates.

Conclusion

GPT-5.5 "Spud" represents a formidable step in OpenAI's roadmap, solidifying its commitment to building AI that can act autonomously in complex software engineering and scientific environments. While the price increase and the slightly lower SWE-Bench Pro score compared to its main competitor are points of consideration, the model's exceptional tool use and capability to handle 20-hour tasks make it a massive upgrade for developers and researchers alike. As agentic AI continues to evolve, GPT-5.5 will undoubtedly serve as a critical foundation for the next generation of autonomous systems.

Sources

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.