GLM-4.7-Flash: The New King of MoE Models in 30B Class

Z.ai releases GLM-4.7-Flash, a 30B MoE model with exceptional reasoning, coding, and agentic capabilities, rivaling much larger models.

by HowAIWorks Team
GLM-4.7-FlashLLMMoE ArchitectureAI AgentsLarge Language ModelsZ.aiOpen Source AI

Introduction

The landscape of efficient, high-performance language models has a new contender: GLM-4.7-Flash. Developed by the Z.ai team, this model represents a significant leap forward in the 30B parameter class, utilizing a sophisticated Mixture of Experts (MoE) architecture to deliver reasoning and coding capabilities that were previously reserved for much larger models.

GLM-4.7-Flash is specifically designed for what Z.ai calls "ARC" tasks—Agentic, Reasoning, and Coding. By balancing efficiency with high-level cognitive performance, it aims to become the go-to choice for developers building complex AI agents and sophisticated software engineering tools.

Technical Architecture & Features

At its core, GLM-4.7-Flash is a 30B-A3B MoE model. This architecture allows the model to activate only a subset of its parameters during inference, maintaining the speed of a smaller model while benefiting from the knowledge capacity of a larger one.

Key Technical Specifications

  • 131,072 Token Context Window: Enables the model to process massive amounts of information in a single pass, which is critical for analyzing large codebases or long legal documents.
  • Preserved Thinking Mode: A unique feature designed to enhance performance in multi-turn agentic tasks. It allows the model to maintain context and "train of thought" more effectively across complex interactions.
  • Multilingual Support: While highly optimized for English and Chinese, the model shows strong performance across multiple languages.

Performance Benchmarks

GLM-4.7-Flash doesn't just promise performance; it delivers on some of the industry's most rigorous benchmarks. Most notably, it has set a new standard for models in the 30B class on developer-centric tasks.

BenchmarkScoreSignificance
SWE-bench Verified59.2Outperforms Qwen3-30B and rivals larger models in software engineering tasks.
AIME 2591.6Competitive with high-end open-source models like GPT-OSS-20B.
GPQA75.2Demonstrates strong expert-level reasoning capabilities.
τ²-Bench79.5High performance in long-context and complex reasoning scenarios.
LCB v664.0Strong results in competitive coding benchmarks.

Why It Matters for Developers

The release of GLM-4.7-Flash is particularly significant for the agentic AI ecosystem. Its combination of a large context window and the Preserved Thinking mode makes it an ideal candidate for:

  • AI Software Engineers: Its high SWE-bench score indicates a superior ability to understand and fix real-world software issues.
  • Autonomous Agents: The model's focus on agentic tasks ensures it can handle multi-step planning and tool usage effectively.
  • Efficient Deployment: Thanks to the MoE architecture, it provides a high-performance alternative that is cheaper and faster to run than monolithic 70B+ models.

How to Get Started

GLM-4.7-Flash is accessible through several channels:

  1. Z.ai API Platform: Developers can integrate the model into their applications via the official API.
  2. Chat Demo: A web-based experience is available at chat.z.ai for testing its capabilities.
  3. Local Implementation: The model is supported by the transformers library, allowing for local hosting and fine-tuning using the weights available on Hugging Face.

Conclusion

GLM-4.7-Flash marks a turning point for "medium-sized" models. By focusing on the essential pillars of modern AI—reasoning, coding, and agentic behavior—Z.ai has produced a model that punches far above its weight class. As the demand for efficient, capable agents grows, models like GLM-4.7-Flash will be at the forefront of the next generation of AI-powered applications.

Sources

Frequently Asked Questions

GLM-4.7-Flash is a large language model based on the Mixture of Experts (MoE) architecture with a 30B-A3B parameter count, developed by Z.ai for reasoning, coding, and agentic tasks.
It is claimed to be the strongest model in the 30B class, outperforming models like Qwen3-30B on benchmarks like SWE-bench and showing competitive performance with larger OSS models.
The model supports a context window of up to 131,072 tokens, making it suitable for long-document analysis and complex agentic workflows.

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.